I've got myself into a bit of a tiss over averaging and joining tables.
Essentially I want to display the average heights of different plant species using Highcharts, pulling the data from a MySQL database. Unfortunately the height data and the species names were setup to be added in different tables.
I've got it working, however when I download the data and find the averages in Excel the figures are different to those being displayed - so I'm obviously not doing it right. I've double checked I'm doing it right in Excel so almost certain it's my MySQL query that's stuffing up.
There's loads of entries in the actual tables, so I've just put an example below.
The query I have at the moment is:
<?php
$result = mysql_query("
SELECT DISTINCT(plant_records.plant_id), ROUND(AVG(plant_records.height),2) as plant_average, plant_list.id, plant_list.plant_species
FROM plant_records
INNER JOIN plant_list
ON plant_records.plant_id=plant_list.id
GROUP BY plant_list.plant_species
") or die(mysql_error());
while ($row = mysql_fetch_array($result)) {
$xAxisValues[] = "'" . $row['plant_species'] . "'";
$AseriesValues[] = $row['plant_average'];
}
?>
Am I doing it right? I found some nice tutorials explaining joins, like this one, but I'm still confused. I'm wondering if I'm averaging before I've joined them, or something??
"plant_id" in the Records table corresponds with "id" in the List table
plant_records:
id plant_id date_recorded height
1 3 01/01/2013 0.2523123
2 1 02/01/2013 0.123
3 3 03/02/2013 0.446
4 3 04/03/2013 0.52
5 1 05/03/2013 0.3
6 2 06/03/2013 0.111
7 2 07/05/2013 0.30
8 4 08/05/2013 0.22564
9 1 09/05/2013 1.27
10 3 10/05/2013 1.8
plant_list:
id registration_date contact_name plant_species plant_parent
1 01/01/2013 Dave ilex_prinos London_Holly
2 02/01/2013 Bill acer_saccharum Brighton_Maple
3 01/01/2013 Bob ilex_prinos London_Holly
4 04/01/2013 Bruno junip_communis Park_Juniper
EDIT: I've tried every possible way of finding the data using Excel (e.g. deliberately not filtering unique IDs, different average types, selecting multiple species, etc) to find the calculation my query is using, but I can't get the same results.
I notice two issues with your query at the moment.
Selecting plant_list.id
while having a GROUP BY plant_list.plant_species
will not yield anything of interest, due to the fact that MySQL will return an arbitrary id
from any of the plants that match each species.
You state that you are only interested in the most recent recording, but nothing in your query reflects that fact.
Given that information, try this query:
SELECT ROUND(AVG(pr.height),2) as plant_average, plant_list.plant_species
FROM plant_records pr
INNER JOIN plant_list
ON pr.plant_id=plant_list.id
WHERE pr.date_recorded = (
SELECT MAX(pri.date_recorded) FROM plant_records pri
WHERE pri.plant_id = pr.plant_id
)
GROUP BY plant_list.plant_species
Alternately, if you want just the average heights for a specific date, simply pass that directly into the query, instead of using the subquery.
If we are assuming that plant_id
is not the unique identifier - meaning that a single plant_id
is only for one single plant of any given species and you want to know what the average height of a single species is you can do this:
SELECT PL.plant_species, ROUND(AVG(PR.height),2) as plant_average
FROM plant_records AS PR
JOIN plant_list AS PL
ON PR.plant_id=PL.id
GROUP BY PL.plant_species
This will return something like:
plant_species plant_average
acer_saccharum 0.2100000
ilex_prinos 0.6700000
junip_communis 0.2300000