Mysql选择查询性能变差

I got a mysql query that selects all clicks for each hour of a day. This query worked good till we have alot of click entries in our database. Now it needs sometimes several seconds (up to 9!) to request the datas...

The query is:

SELECT h.clickHour, COUNT(clicktime) AS c
      FROM ( SELECT 0 AS clickHour
             UNION ALL SELECT 1
             UNION ALL SELECT 2
             UNION ALL SELECT 3
             UNION ALL SELECT 4
             UNION ALL SELECT 5
             UNION ALL SELECT 6
             UNION ALL SELECT 7
             UNION ALL SELECT 8
             UNION ALL SELECT 9
             UNION ALL SELECT 10
             UNION ALL SELECT 11
             UNION ALL SELECT 12
             UNION ALL SELECT 13
             UNION ALL SELECT 14
             UNION ALL SELECT 15
             UNION ALL SELECT 16
             UNION ALL SELECT 17
             UNION ALL SELECT 18
             UNION ALL SELECT 19
             UNION ALL SELECT 20
             UNION ALL SELECT 21
             UNION ALL SELECT 22
             UNION ALL SELECT 23 ) AS h
    INNER JOIN links l ON l.user_id = 1
    LEFT OUTER
      JOIN clicks
        ON EXTRACT(HOUR FROM clicks.clicktime) = h.clickHour
          AND DATE(clicks.clicktime) = '2014-09-21'
          AND clicks.link_id = l.id
    GROUP
        BY h.clickHour

I got these unions because i need clicks for each hour also empty hours... Please help!

Ok so we are talking about 0 to several thousand rows for the table clicks. The click time is saved as a timestamp and every click got a unique id. I see that the union thing is bad and i have to change it.

What i try now is to select all clicks of a day grouped by HOUR(clicktime): But when i do so I get too many results like 10x then it should be.

I'd rewrite the query like this:

SELECT h.clickHour
     , IFNULL(d.clickCount,0) AS c
  FROM ( SELECT 0 AS clickHour UNION ALL SELECT  1 UNION ALL SELECT  2
           UNION ALL SELECT  3 UNION ALL SELECT  4 UNION ALL SELECT  5
           UNION ALL SELECT  6 UNION ALL SELECT  7 UNION ALL SELECT  8
           UNION ALL SELECT  9 UNION ALL SELECT 10 UNION ALL SELECT 11
           UNION ALL SELECT 12 UNION ALL SELECT 13 UNION ALL SELECT 14
           UNION ALL SELECT 15 UNION ALL SELECT 16 UNION ALL SELECT 17
           UNION ALL SELECT 18 UNION ALL SELECT 19 UNION ALL SELECT 20
           UNION ALL SELECT 21 UNION ALL SELECT 22 UNION ALL SELECT 23 
       ) h
  LEFT
  JOIN ( SELECT EXTRACT(HOUR FROM c.clicktime) AS clickHour
              , SUM(1) AS clickCount
           FROM clicks c
           JOIN links l
             ON l.user_id = 1
            AND l.id = c.link_id
          WHERE c.clicktime >= '2014-09-21'
            AND c.clicktime <  '2014-09-21' + INTERVAL 1 DAY 
          GROUP BY EXTRACT(HOUR FROM c.clicktime)
       ) d
    ON d.clickHour = h.clickHour

The approach here is to get the inline view query d to return a maximum of 24 rows. This cranks through the clicks table to get the counts. W're going to defer the join operation to the fixed set of 24 rows until after we have calculated the hourly counts. (The join to h is there only to get rows with zero counts returned, which would otherwise just be "missing" rows.)

You can test the performance of the inline view query d, and of the entire query, I suspect there won't be much difference. The cost of materializing the inline view h isn't that much (there's some overhead, but it's very likely that will use the Memory storage engine; it's small enough and it should be simple integer datatype.) And that join operation of 24 rows to 24 rows won't be that expensive, even without any indexes available.

I suspect that the majority of time will be in materializing the derived table d.

We're going to want an index with a leading column of clickDate, so that we can use a more efficient index range scan operation, to avoid evaluating expressions for every flipping row in the table.

I changed this predicate: DATE(clickTime) = '2014-09-21' into a predicates that reference the bare column, this enables MySQL to consider an efficient range scan operation on the clickTime column, (to quickly eliminate a boatload of rows from consideration), rather than requiring that MySQL evaluate a function on every flipping row in the table.

Some performance gain may be obtained by making covering indexes available on the clicks and links tables (so that the query can be satisfied from the indexes, without a need to visit pages in the underlying table.)

At a minimum on the clicks table:

ON clicks (clickTime, link_id)

If id is unique (or primary key) on the links table, this index may not give any performance benefit:

ON links (id, user_id)

If a covering index used, the EXPLAIN output should show "Using index".

I don't see a way around the "Using filesort" operation, not without adding a column to clicks table that stores the clickTime truncated to the hour. With a column like that, and an appropriate index, it's possible that we could get the GROUP BY operation optimized using the index, avoiding the "Using filesort" operation.

Have you indexed?

Clicks table: clicktime, link_id

Links table: id, user_id