I got a mysql query that selects all clicks for each hour of a day. This query worked good till we have alot of click entries in our database. Now it needs sometimes several seconds (up to 9!) to request the datas...
The query is:
SELECT h.clickHour, COUNT(clicktime) AS c
FROM ( SELECT 0 AS clickHour
UNION ALL SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
UNION ALL SELECT 6
UNION ALL SELECT 7
UNION ALL SELECT 8
UNION ALL SELECT 9
UNION ALL SELECT 10
UNION ALL SELECT 11
UNION ALL SELECT 12
UNION ALL SELECT 13
UNION ALL SELECT 14
UNION ALL SELECT 15
UNION ALL SELECT 16
UNION ALL SELECT 17
UNION ALL SELECT 18
UNION ALL SELECT 19
UNION ALL SELECT 20
UNION ALL SELECT 21
UNION ALL SELECT 22
UNION ALL SELECT 23 ) AS h
INNER JOIN links l ON l.user_id = 1
LEFT OUTER
JOIN clicks
ON EXTRACT(HOUR FROM clicks.clicktime) = h.clickHour
AND DATE(clicks.clicktime) = '2014-09-21'
AND clicks.link_id = l.id
GROUP
BY h.clickHour
I got these unions because i need clicks for each hour also empty hours... Please help!
Ok so we are talking about 0 to several thousand rows for the table clicks. The click time is saved as a timestamp and every click got a unique id. I see that the union thing is bad and i have to change it.
What i try now is to select all clicks of a day grouped by HOUR(clicktime): But when i do so I get too many results like 10x then it should be.
I'd rewrite the query like this:
SELECT h.clickHour
, IFNULL(d.clickCount,0) AS c
FROM ( SELECT 0 AS clickHour UNION ALL SELECT 1 UNION ALL SELECT 2
UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5
UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8
UNION ALL SELECT 9 UNION ALL SELECT 10 UNION ALL SELECT 11
UNION ALL SELECT 12 UNION ALL SELECT 13 UNION ALL SELECT 14
UNION ALL SELECT 15 UNION ALL SELECT 16 UNION ALL SELECT 17
UNION ALL SELECT 18 UNION ALL SELECT 19 UNION ALL SELECT 20
UNION ALL SELECT 21 UNION ALL SELECT 22 UNION ALL SELECT 23
) h
LEFT
JOIN ( SELECT EXTRACT(HOUR FROM c.clicktime) AS clickHour
, SUM(1) AS clickCount
FROM clicks c
JOIN links l
ON l.user_id = 1
AND l.id = c.link_id
WHERE c.clicktime >= '2014-09-21'
AND c.clicktime < '2014-09-21' + INTERVAL 1 DAY
GROUP BY EXTRACT(HOUR FROM c.clicktime)
) d
ON d.clickHour = h.clickHour
The approach here is to get the inline view query d
to return a maximum of 24 rows. This cranks through the clicks
table to get the counts. W're going to defer the join operation to the fixed set of 24 rows until after we have calculated the hourly counts. (The join to h
is there only to get rows with zero counts returned, which would otherwise just be "missing" rows.)
You can test the performance of the inline view query d
, and of the entire query, I suspect there won't be much difference. The cost of materializing the inline view h
isn't that much (there's some overhead, but it's very likely that will use the Memory storage engine; it's small enough and it should be simple integer datatype.) And that join operation of 24 rows to 24 rows won't be that expensive, even without any indexes available.
I suspect that the majority of time will be in materializing the derived table d
.
We're going to want an index with a leading column of clickDate
, so that we can use a more efficient index range scan operation, to avoid evaluating expressions for every flipping row in the table.
I changed this predicate: DATE(clickTime) = '2014-09-21'
into a predicates that reference the bare column, this enables MySQL to consider an efficient range scan operation on the clickTime column, (to quickly eliminate a boatload of rows from consideration), rather than requiring that MySQL evaluate a function on every flipping row in the table.
Some performance gain may be obtained by making covering indexes available on the clicks
and links
tables (so that the query can be satisfied from the indexes, without a need to visit pages in the underlying table.)
At a minimum on the clicks table:
ON clicks (clickTime, link_id)
If id
is unique (or primary key) on the links
table, this index may not give any performance benefit:
ON links (id, user_id)
If a covering index used, the EXPLAIN output should show "Using index".
I don't see a way around the "Using filesort" operation, not without adding a column to clicks
table that stores the clickTime truncated to the hour. With a column like that, and an appropriate index, it's possible that we could get the GROUP BY
operation optimized using the index, avoiding the "Using filesort" operation.
Have you indexed?
Clicks table: clicktime, link_id
Links table: id, user_id