在SQL中查找给定日期的最高温度

My tables:

hourly_weather                 electrical_readings
----------------               -----------------------
meter | time_read | temp       meter | time      | kwh
----------------               -----------------------
1       1316044800  55         1       1316136250  19.24
1       1316138400  56         1       1316044320  18.29
(...)                          (...)

I want to retrieve two important values from this data:

1) I want the total KW for a given day

2) And I want the max temperature for that day

The query I'm using takes WAYYYY too long to run but I can't think of another way to do it. Like, several hours for 100,000 rows of data in both tables.

SELECT * FROM (
SELECT * , SUM(kwh) AS sumkwh, 
           DATE( FROM_UNIXTIME( r.time_read ) ) AS datex, 
           UNIX_TIMESTAMP( DATE( FROM_UNIXTIME( r.time_read ) ) ) AS datey, 
           (
               SELECT MAX( temp )
               FROM hourly_weather hw
               WHERE hw.meter = 1
                 AND time_read >= datey
                 AND time_read < datey + 86400
           ) AS temp
FROM electrical_readings r
WHERE id = 1
GROUP BY datex
) as t1
WHERE t1.temp != '';
SELECT DATE(FROM_UNIXTIME(r.time_read)) AS datex, 
  SUM(r.kwh) AS sumkwh, MAX(hw.temp) AS temp
FROM electrical_readings r
LEFT OUTER JOIN hourly_weather hw
  ON DATE(FROM_UNIXTIME(r.time_read)) = DATE(FROM_UNIXTIME(hw.time_read)) 
  AND hw.meter = 1
WHERE r.id = 1
GROUP BY datex
HAVING temp IS NOT NULL

This will still be a problem for performance, because this uses expressions for the joins. It therefore has to read every row of both tables, to evaluate the expressions before it can tell if the join is satisfied.

It would therefore be much better if you could add an extra column to both tables for the date (with no time) and index those columns.

ALTER TABLE electrical_readings ADD COLUMN date_read DATE, ADD KEY (date_read);
UPDATE electrical_readings SET date_read = DATE(FROM_UNIXTIME(time_read));

ALTER TABLE hourly_weather ADD COLUMN date_read DATE, ADD KEY (date_read);
UPDATE hourly_weather SET date_read = DATE(FROM_UNIXTIME(time_read));

SELECT r.date_read, 
  SUM(r.kwh) AS sumkwh, MAX(hw.temp) AS temp
FROM electrical_readings r
LEFT OUTER JOIN hourly_weather hw
  ON r.date_read = hw.date_read 
  AND hw.meter = 1
WHERE r.id = 1
GROUP BY r.date_read
HAVING temp IS NOT NULL

In any case, adding SELECT * to either of these queries is not a good idea, because the results will be arbitrary.


Re your comment, sorry, the sum is multiplied by the number of matching rows in hourly_weather.

We can compensate by doing the aggregate for hourly_weather in a derived table subquery.

SELECT r.date_read, 
  SUM(r.kwh) AS sumkwh, hw.temp
FROM electrical_readings r
LEFT OUTER JOIN (
  SELECT date_read, MAX(temp) AS temp
  FROM hourly_weather
  WHERE meter = 1
  GROUP BY date_read) AS hw
    ON r.date_read = hw.date_read 
WHERE r.id = 1
GROUP BY r.date_read
HAVING temp IS NOT NULL

It would be good to create an index on hourly_weather:

ALTER TABLE hourly_weather ADD KEY (date_read, meter, temp);

I think it would be simpler to calculate both values in separate queries and then joining the resulting data sets. You can even define temporary variables and tables to make things easier:

# Temp variables for the dates
set @t0 = cast('2013-02-01' as date);
set @t1 = cast('2013-02-02' as date);

# Temporary table 1: Sum of KWH
create temporary table temp_sum_kw
    select 
        date(from_unixtime(timeread)) as `date`, sum(KWH) as sum_kwh
    from 
        electrical_readings
    where 
        timeread >= unix_timestamp(@t0) and timeread < unix_timestamp(date_add(@t1, interval +1 day))
    group by 
        date(from_unixtime(timeread));
alter table temp_sum_kw
    add index idx_date(`date`);

# Temporary table 2: Max temp
create temporary table temp_max_temperature
    select 
        date(from_unixtime(timeread)) as `date`, max(temp) as max_temp
    from 
        hourly_weather
    where 
        (timeread >= @t0 and timeread < date_add(@t1, interval +1 day))
        and meter = 1
    group by 
        date(from_unixtime(timeread));
alter table temp_max_temperature
    add index idx_date(`date`);

# Put it all together
select 
    m.*, t.max_temp
from
    temp_sum_kw as m
    inner join temp_max_temperature as t on m.`date` = t.`date`;

The reason for using the where condition timeread >= @t0 and timeread < date_add(@t1, interval +1 day) is to include everything that happens until the last moment of @t1.

Hope this helps you