The title of this doesn't quite make sense, so I'll do my best to explain.
I have a very large dataset (1000's of rows) in a single table. The data in this table relates to GPS tracking of vehicles. When the vehicle is stationary (Speed=0), the latitude and longitude can vary quite dramatically over a period of 12 hours.
My current SELECT query is this:
$query = "SELECT UUID, UNITID, Truncate(LONGITUDE,6) AS LONGITUDE, Truncate(LATITUDE,6) AS LATITUDE, SPEED, TRACKINGTIME FROM trackpoint_au WHERE SPEED > -1 Order By UnitID, TRACKINGTIME";
The query returns an XML page, via PHP. Built like so:
header("Content-type: text/xml");
// Start XML file, echo parent node
echo '<markers>';
// Iterate through the rows, printing XML nodes for each
while ($row = @mysql_fetch_assoc($result)){
// ADD TO XML DOCUMENT NODE
echo '<marker ';
echo 'unitid="' . $row['UNITID'] . '" ';
echo 'lat="' . $row['LATITUDE'] . '" ';
echo 'lng="' . $row['LONGITUDE'] . '" ';
echo 'spd="' . $row['SPEED'] . '" ';
echo 'time="' . $row['TRACKINGTIME'] . '" ';
echo '/>';
}
// End XML file
echo '</markers>';
The output looks like:
<marker unitid="7711010426" lat="-32.080402" lng="115.854890" spd="0" time="2011-11-30 06:15:00" />
<marker unitid="7711010426" lat="-32.080376" lng="115.854880" spd="0" time="2011-11-30 06:16:00" />
<marker unitid="7711010426" lat="-32.080364" lng="115.854880" spd="0" time="2011-11-30 06:17:00" />
<marker unitid="7711010426" lat="-32.080330" lng="115.854836" spd="0" time="2011-11-30 06:18:00" />
<marker unitid="7711010426" lat="-32.080326" lng="115.854860" spd="1.85" time="2011-11-30 06:20:00" />
<marker unitid="7711010426" lat="-32.080265" lng="115.854890" spd="0" time="2011-11-30 06:21:00" />
<marker unitid="7711010426" lat="-32.080276" lng="115.854920" spd="0" time="2011-11-30 06:22:00" />
<marker unitid="7711010426" lat="-32.080315" lng="115.854900" spd="0" time="2011-11-30 06:23:00" />
<marker unitid="7711010426" lat="-32.080296" lng="115.854866" spd="0" time="2011-11-30 06:24:00" />
My question is this: How can I use PHP OR MYSQL to return the average latitude/longitude of the rows with spd=0?
My resulting data should be like this:
<marker unitid="7711010426" lat="-32.080367" lng="115.8548715" spd="0" time="2011-11-30 06:18:00" />
<marker unitid="7711010426" lat="-32.080326" lng="115.854860" spd="1.85" time="2011-11-30 06:20:00" />
<marker unitid="7711010426" lat="-32.080288" lng="115.854894" spd="0" time="2011-11-30 06:24:00" />
Note, the 'average' row has the LAST timestamp of the rows that have been averaged out.
I have tried to use a 'Group By Speed'. However, this fails to do what I need as it groups ALL the records with an identical speed, not just the ones with a value of 0.
EDIT
Grouping by the UUID as suggested by macek does not help, as the UUID is unique for each row.
<marker time="2011-11-30 06:15:00" spd="0" lng="115.854890" lat="-32.080402" unitid="7711010426" uuid="c6d50454-aa5b-4069-8756-72c787923173"/>
<marker time="2011-11-30 06:16:00" spd="0" lng="115.854880" lat="-32.080376" unitid="7711010426" uuid="be6f9052-ab00-430a-8cec-6abf5051cad1"/>
ANSWER
After posting the question and reading some of the answers below, I managed to put this PHP code together. It loops through all the rows, checks the speed, if the speed is 0, check the next row (until speed<>0) and average out the lat/lng of those points.
for($i=0;$i<$num;$i++){
mysql_data_seek($result,$i);
$row = mysql_fetch_assoc($result);
if ($row['SPEED']==0){
//echo $i . ' spd: '.$row['SPEED'] . '<br />';
$spd0 = true;
$counter = 1;
$lat = $row['LATITUDE'];
$lng = $row['LONGITUDE'];
$i++;
while (($spd0==true) && ($i<$num)){
//echo ' + ' . $i;
mysql_data_seek($result,$i);
$row2 = mysql_fetch_assoc($result);
if (($row2['UNITID']==$row['UNITID']) && ($row2['SPEED']==0)){
$counter++;
$lat = $lat + $row2['LATITUDE'];
$lng = $lng + $row2['LONGITUDE'];
//echo $i . ' spd: '.$row2['SPEED'] . '<br />';
$i++;
}
else{
$spd0=false;
$i--;
}
}
$lat = $lat/$counter;
$lng = $lng/$counter;
// ADD TO XML DOCUMENT NODE
echo '<marker ';
echo 'uuid ="' . $row['UUID'] . '" ';
echo 'unitid="' . $row['UNITID'] . '" ';
echo 'lat="' . $lat . '" ';
echo 'lng="' . $lng . '" ';
echo 'spd="' . $row['SPEED'] . '" ';
echo 'time="' . $row['TRACKINGTIME'] . '" ';
echo '/>';
}
else {
//echo $i;
// ADD TO XML DOCUMENT NODE
echo '<marker ';
echo 'uuid ="' . $row['UUID'] . '" ';
echo 'unitid="' . $row['UNITID'] . '" ';
echo 'lat="' . $row['LATITUDE'] . '" ';
echo 'lng="' . $row['LONGITUDE'] . '" ';
echo 'spd="' . $row['SPEED'] . '" ';
echo 'time="' . $row['TRACKINGTIME'] . '" ';
echo '/>';
}
}
If someone has a more elegant way of checking the next rows, please post it, as always looking for ways to improve my code.
Thanks all!
in addition to the normal use of GROUP BY
and AVG()
you may be interested in Quassnoi's answer to my question here:
GROUP BY for continuous rows in SQL
He posted a very nice solution that also performs very well with many rows.
Think of the speed as a state, and you want to aggregate all continous rows within a time period that have the same speed.
Here is my attempt on rewriting your query using this method:
SELECT
UNITID,
/* we aggregate multiple rows, maybe you want to know which ones..
this one is optional */
CAST(GROUP_CONCAT(UUID SEPARATOR ', ') AS CHAR) AS UUIDS,
/* is group field in the inner subquery, we can just use it
in our select without an aggregate function */
SPEED,
/* very important to select the lowest timestamp -
this is the time when your unit has stopped moving ..
first row with speed=0 */
MIN(TRACKINGTIME) AS TRACKINGTIME,
/* we calc the average on latitude here */
TRUNCATE(AVG(LATITUDE),6) AS LATITUDE,
/* same for longitude */
TRUNCATE(AVG(LONGITUDE),6) AS LONGITUDE,
/* maybe you want to know how many rows with speed 0
are grouped together */
COUNT(UUID) AS AGGREGATE_COUNT
FROM (
SELECT
/* this increases the counter variable @r each time
the state has changed.. when speed of the previous row
was also "0" and is "0" in the current row,
the counter is not increased. -- this is a virtual field
we will use for GROUPing.
@speed is used to remember the speed of the previous
row for comparison in @r to determine if the speed has changed
*/
@r := @r + (@prev_unit != UNITID
OR @prev_speed != 0
OR SPEED != 0) AS gn,
@prev_speed := SPEED AS a_speed,
@prev_unit := UNITID AS a_unit,
tp.*
FROM (
SELECT @r := 0,
@prev_speed := 1,
@prev_unit := ''
) vars,
trackpoint_au tp
ORDER BY
UNITID, TRACKINGTIME
) q
GROUP BY
gn
ORDER BY
UNITID
Test data:
CREATE TABLE `trackpoint_au` (
`uuid` int(11) NOT NULL AUTO_INCREMENT,
`latitude` decimal(10,0) NOT NULL,
`longitude` decimal(10,0) NOT NULL,
`speed` int(11) NOT NULL,
`unitid` int(11) NOT NULL,
`trackingtime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`uuid`)
) ENGINE=MyISAM;
INSERT INTO trackpoint_au (unitid, speed, latitude, longitude, trackingtime) VALUES
(1, 0, 10, 10, NOW()),
(1, 0, 20, 20, NOW()),
(1, 1, 10, 10, NOW()),
(1, 0, 10, 10, NOW()),
(1, 0, 30, 30, NOW()),
(2, 0, 10, 10, NOW()),
(2, 0, 20, 20, NOW()),
(3, 1, 10, 10, NOW()),
(4, 0, 10, 10, NOW()),
(4, 0, 20, 20, NOW()),
(4, 1, 30, 30, NOW()),
(4, 0, 60, 60, NOW()),
(4, 0, 60, 60, NOW());
Result:
+--------+--------+-------+---------------------+-----------+-----------+-----------------+
| UNITID | UUIDS | SPEED | TRACKINGTIME | LATITUDE | LONGITUDE | AGGREGATE_COUNT |
+--------+--------+-------+---------------------+-----------+-----------+-----------------+
| 1 | 2, 1 | 0 | 2011-12-05 09:34:13 | 15.000000 | 15.000000 | 2 |
| 1 | 3 | 1 | 2011-12-05 09:34:13 | 10.000000 | 10.000000 | 1 |
| 1 | 4, 5 | 0 | 2011-12-05 09:34:13 | 20.000000 | 20.000000 | 2 |
| 2 | 6, 7 | 0 | 2011-12-05 09:34:13 | 15.000000 | 15.000000 | 2 |
| 3 | 8 | 1 | 2011-12-05 09:34:13 | 10.000000 | 10.000000 | 1 |
| 4 | 9, 10 | 0 | 2011-12-05 09:34:13 | 15.000000 | 15.000000 | 2 |
| 4 | 11 | 1 | 2011-12-05 09:34:13 | 30.000000 | 30.000000 | 1 |
| 4 | 12, 13 | 0 | 2011-12-05 09:34:13 | 60.000000 | 60.000000 | 2 |
+--------+--------+-------+---------------------+-----------+-----------+-----------------+
This should get you on the right track
-- get average lat/lng for each unitid where speed is 0
select uuid, unitid, avg(lat), avg(lng)
from trackpoint_au
where speed=0
group by uuid, unitid
When using group by
, any selected fields that do not have a composite function being used on them should be added to the group by
statement
added uuid
to SELECT and GROUP BY
I shall rather propose something different. If the change in the Lat and Long are not significant when speed = 0, why are you trying to get the average? You might truncate the last digit or round it up by 5 or 10 for such values.
This will not however give you fewer rows. But you should be able to get consistent values for Lat and Long.
while ($row = @mysql_fetch_assoc($result)){
if( $row['SPEED']!=0){
echo 'list average';
clear list;
// ADD TO XML DOCUMENT NODE
echo '<marker ';
echo 'unitid="' . $row['UNITID'] . '" ';
echo 'lat="' . $row['LATITUDE'] . '" ';
echo 'lng="' . $row['LONGITUDE'] . '" ';
echo 'spd="' . $row['SPEED'] . '" ';
echo 'time="' . $row['TRACKINGTIME'] . '" ';
echo '/>';
} else {
//put data to a list
}
}