I have a huge bottle neck on my Game Server for the following query which is used to store the current leaderboard.
I am currently only calling this query via cron once every 5 mins, but would love to optimise it enough to be called every minute or when needed.
The query is taking 30 seconds and currently only ~2000 users and 7000 games played (stored in Games and TopPlayerScores). I'm afraid it will only get worse!! Please Help me Overflow-Kenobi! Your my only hope!
SET @rank=0;
INSERT INTO Board (TopScorePKID, GamePKID, UserPKID, UniquePlayerID, PlayerName, TopPlayerScore, Position, Date)
(SELECT bad.ID AS TopScorePKID, bad.GamePKID, bad.UserPKID, bad.UniquePlayerID, bad.PlayerName, bad.TopPlayerScore, @rank:=@rank+1 AS Position, bad.Date
FROM (
SELECT g.GamePKID, g.TopPlayerScore, l.ID, l.UserPKID, u.UniquePlayerID, u.PlayerName, (l.Date) AS Date
FROM Games g, TopPlayerScores l, UserDetails u
WHERE l.GamePKID = g.GamePKID
AND u.UserPKID = l.UserPKID
AND u.SECRET_DETAIL = 0
AND g.TopPlayerScore >= (SELECT DISTINCT k.TopPlayerScore AS Highest
FROM Games k, TopPlayerScores t
WHERE t.UserPKID = l.UserPKID
AND k.GamePKID = t.GamePKID
ORDER BY k.TopPlayerScore DESC
LIMIT 1)
GROUP BY l.UserPKID
ORDER BY g.TopPlayerScore DESC, Date ASC)
AS bad);
Please someone help!! Should I break it up onto views? Or use Inner Join keywords? What is the best approach?
Thanks so much for even looking at this mess :D!
UPDATED 1.0 : EXPLAIN EXTENDED Results:
id select_type table type possible_keys key key_len ref rows filtered Extra 1 PRIMARY ALL NULL NULL NULL NULL 1521 100.00 2 DERIVED l ALL NULL NULL NULL NULL 6923 100.00 Using temporary; Using filesort 2 DERIVED u eq_ref PRIMARY PRIMARY 4 DBNAME.l.UserPKID 1 100.00 Using where 2 DERIVED k eq_ref PRIMARY PRIMARY 4 DBNAME.l.GamePKID 1 100.00 Using where 3 DEPENDENT SUBQUERY t ALL NULL NULL NULL NULL 6923 100.00 Using where; Using temporary; Using filesort 3 DEPENDENT SUBQUERY g eq_ref PRIMARY PRIMARY 4 DBNAME.t.GamePKID 1 100.00 Using where
UPDATED 2.0: Limited Schema for querying tables
Using the Games to store game scores and other information about a game
`Games` (
`GamePKID` int(11) NOT NULL AUTO_INCREMENT,
`TopPlayerScore` int(11) NOT NULL,
`OTHER_MISC_STUFF_REMOVED` int(11) NOT NULL
PRIMARY KEY (`GamePKID`)
)
Using the following to link the user to the Game and store the time/date
`TopPlayerScores` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`UserPKID` int(11) NOT NULL,
`GamePKID` int(11) NOT NULL,
`Date` datetime NOT NULL,
PRIMARY KEY (`ID`)
)
Used to store each unique player
`UserDetails` (
`UserPKID` int(11) NOT NULL AUTO_INCREMENT,
`UniquePlayerID` char(40) NOT NULL,
`PlayerName` char(96) NOT NULL,
`SECRET_DETAIL` tinyint(1) NOT NULL DEFAULT '0',
`isPlayer` tinyint(4) DEFAULT NULL,
PRIMARY KEY (`UserPKID`)
)
The first thing I would note, although this will not improve performance is that the JOIN syntax you are using was replaced more than 20 years ago by the ANSI 92 expcict join syntax, it is entirely subject of course, but Aaron Bertrand explains some very good reasons to switch to the newer syntax.
The second thing to note is that your results will be non deterministic. You are selecting columns not contained in either an aggregate or a group by. While MySQL allows this, you are not using the feature as MySQL intended. The MySQL docs state:
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
However, some of the columns you have included (g.GamePKID
, g.TopPlayerScore
, l.ID
, l.Date
), do not satisfy the condition are the same for each group, therefore, as stated, MySQL is free to choose whichever values it likes, and even though you have ORDER BY g.TopPlayerScore DESC, Date ASC
this does not influence the single row for each group MySQL chooses.
Thirdly, MySQL has limitations with correlated subqueries and it this can hinder performance. If you can change these to JOINs you should see a performance improvement.
With all this in mind I would rewrite your query as so:
SET @rank=0;
INSERT INTO Board (TopScorePKID, GamePKID, UserPKID, UniquePlayerID, PlayerName, TopPlayerScore, Position, Date)
SELECT bad.ID AS TopScorePKID,
bad.GamePKID,
bad.UserPKID,
bad.UniquePlayerID,
bad.PlayerName,
bad.TopPlayerScore,
@rank:=@rank+1 AS Position,
bad.Date
FROM ( SELECT g.GamePKID,
g.TopPlayerScore,
l.ID,
l.UserPKID,
u.UniquePlayerID,
u.PlayerName,
l.Date
FROM Games g
INNER JOIN TopPlayerScores l
ON l.GamePKID = g.GamePKID
INNER JOIN UserDetails u
ON u.UserPKID = l.UserPKID
INNER JOIN
( SELECT TopPlayerScores.UserPKID, MAX(games.TopPlayerScore) AS MaxPlayerScore
FROM TopPlayerScores
INNER JOIN Games
ON Games.GamePKID = TopPlayerScores.GamePKID
GROUP BY TopPlayerScores.UserPKID
) MaxScore
ON MaxScore.UserPKID = l.UserPKID
AND MaxScore.MaxPlayerScore = g.TopPlayerScore
WHERE u.SECRET_DETAIL = 0
) AS bad
ORDER BY bad.TopPlayerScore DESC, bad.Date ASC;
The subquery MaxScore
should have the effect of limiting the results to one row per player (only their highest score), although additional logic may be required to handle ties (e.g. where the player has the same topscore in more than one game). Without knowing exact requirements I can't correct this.
EDIT
In order to remove duplicates where a player has the same top score on 2 or more games, and make it truly deterministic you need to add a further subquery:
SET @rank=0;
SELECT bad.ID AS TopScorePKID,
bad.GamePKID,
bad.UserPKID,
bad.UniquePlayerID,
bad.PlayerName,
bad.TopPlayerScore,
@rank:=@rank+1 AS Position,
bad.Date
FROM ( SELECT Games.GamePKID,
Games.TopPlayerScore,
TopPlayerScores.ID,
TopPlayerScores.UserPKID,
UserDetails.UniquePlayerID,
UserDetails.PlayerName,
TopPlayerScores.Date
FROM Games
INNER JOIN TopPlayerScores
ON TopPlayerScores.GamePKID = Games.GamePKID
INNER JOIN UserDetails
ON UserDetails.UserPKID = TopPlayerScores.UserPKID
INNER JOIN
( SELECT TopPlayerScores.UserPKID, MAX(games.TopPlayerScore) AS TopPlayerScore
FROM TopPlayerScores
INNER JOIN Games
ON Games.GamePKID = TopPlayerScores.GamePKID
GROUP BY TopPlayerScores.UserPKID
) MaxScore
ON MaxScore.UserPKID = TopPlayerScores.UserPKID
AND MaxScore.TopPlayerScore = Games.TopPlayerScore
INNER JOIN
( SELECT TopPlayerScores.UserPKID, games.TopPlayerScore, MAX(Date) AS Date
FROM TopPlayerScores
INNER JOIN Games
ON Games.GamePKID = TopPlayerScores.GamePKID
GROUP BY TopPlayerScores.UserPKID, games.TopPlayerScore
) MaxScoreDate
ON MaxScoreDate.UserPKID = TopPlayerScores.UserPKID
AND MaxScoreDate.TopPlayerScore = Games.TopPlayerScore
AND MaxScoreDate.Date = TopPlayerScores.Date
WHERE UserDetails.SECRET_DETAIL = 0
) AS bad
ORDER BY bad.TopPlayerScore DESC, bad.Date ASC;
N.B. This query will become much simpler if/when MySQL introduce analytical functions such as ROW_NUMBER()
, or if you switch to a DBMS that already supports them, so just in case either of these things happen, here is a solution using ROW_NUMBER()`
SELECT bad.ID AS TopScorePKID,
bad.GamePKID,
bad.UserPKID,
bad.UniquePlayerID,
bad.PlayerName,
bad.TopPlayerScore,
ROW_NUMBER() OVER(ORDER BY TopPlayerScore DESC) AS Position,
bad.Date
FROM ( SELECT Games.GamePKID,
Games.TopPlayerScore,
TopPlayerScores.ID,
TopPlayerScores.UserPKID,
UserDetails.UniquePlayerID,
UserDetails.PlayerName,
TopPlayerScores.Date,
ROW_NUMBER(PARTITION BY UserDetails.UserPKID
ORDER BY Games.TopPlayerScore DESC,
TopPlayerScores.Date DESC) AS RN
FROM Games
INNER JOIN TopPlayerScores
ON TopPlayerScores.GamePKID = Games.GamePKID
INNER JOIN UserDetails
ON UserDetails.UserPKID = TopPlayerScores.UserPKID
WHERE UserDetails.SECRET_DETAIL = 0
) AS bad
WHERE bad.RN = 1
ORDER BY bad.TopPlayerScore DESC, bad.Date ASC;