混淆Avg()和加入2个表

I've got myself into a bit of a tiss over averaging and joining tables.

Essentially I want to display the average heights of different plant species using Highcharts, pulling the data from a MySQL database. Unfortunately the height data and the species names were setup to be added in different tables.

I've got it working, however when I download the data and find the averages in Excel the figures are different to those being displayed - so I'm obviously not doing it right. I've double checked I'm doing it right in Excel so almost certain it's my MySQL query that's stuffing up.

There's loads of entries in the actual tables, so I've just put an example below.

The query I have at the moment is:

<?php
$result = mysql_query("
SELECT DISTINCT(plant_records.plant_id), ROUND(AVG(plant_records.height),2) as plant_average, plant_list.id, plant_list.plant_species
FROM plant_records
INNER JOIN plant_list
ON plant_records.plant_id=plant_list.id
GROUP BY plant_list.plant_species
")  or die(mysql_error()); 

while ($row = mysql_fetch_array($result)) {
$xAxisValues[] = "'" . $row['plant_species'] . "'";
$AseriesValues[] = $row['plant_average'];
}
?>

Am I doing it right? I found some nice tutorials explaining joins, like this one, but I'm still confused. I'm wondering if I'm averaging before I've joined them, or something??

"plant_id" in the Records table corresponds with "id" in the List table

plant_records:

id  plant_id    date_recorded   height
1   3           01/01/2013      0.2523123
2   1           02/01/2013      0.123
3   3           03/02/2013      0.446
4   3           04/03/2013      0.52
5   1           05/03/2013      0.3
6   2           06/03/2013      0.111
7   2           07/05/2013      0.30
8   4           08/05/2013      0.22564
9   1           09/05/2013      1.27
10  3           10/05/2013      1.8

plant_list:

id  registration_date   contact_name    plant_species   plant_parent
1   01/01/2013          Dave            ilex_prinos     London_Holly
2   02/01/2013          Bill            acer_saccharum  Brighton_Maple
3   01/01/2013          Bob             ilex_prinos     London_Holly
4   04/01/2013          Bruno           junip_communis  Park_Juniper

EDIT: I've tried every possible way of finding the data using Excel (e.g. deliberately not filtering unique IDs, different average types, selecting multiple species, etc) to find the calculation my query is using, but I can't get the same results.

I notice two issues with your query at the moment.

  1. Selecting plant_list.id while having a GROUP BY plant_list.plant_species will not yield anything of interest, due to the fact that MySQL will return an arbitrary id from any of the plants that match each species.

  2. You state that you are only interested in the most recent recording, but nothing in your query reflects that fact.

Given that information, try this query:

SELECT ROUND(AVG(pr.height),2) as plant_average, plant_list.plant_species
FROM plant_records pr
INNER JOIN plant_list
ON pr.plant_id=plant_list.id
WHERE pr.date_recorded = (
    SELECT MAX(pri.date_recorded) FROM plant_records pri
    WHERE pri.plant_id = pr.plant_id
)
GROUP BY plant_list.plant_species

Alternately, if you want just the average heights for a specific date, simply pass that directly into the query, instead of using the subquery.

If we are assuming that plant_id is not the unique identifier - meaning that a single plant_id is only for one single plant of any given species and you want to know what the average height of a single species is you can do this:

SELECT PL.plant_species, ROUND(AVG(PR.height),2) as plant_average
FROM plant_records AS PR
JOIN plant_list AS PL
     ON PR.plant_id=PL.id
GROUP BY PL.plant_species

This will return something like:

plant_species   plant_average
acer_saccharum  0.2100000
ilex_prinos     0.6700000
junip_communis  0.2300000