I need to go through an array containing points in a map and check their distance from one another. I need to count how many nodes are within 200m and 50m of each one. It works fine for smaller amounts of values. However when I tried to run more values through it (around 4000 for scalability testing) an error occurs saying that I have reached the maximum execution time of 300 seconds. It needs to be able to handle at least this much within 300 seconds if possible.
I have read around and found out that there is a way to disable/change this limit, but I would like to know if there is a simpler way of executing the following code so that the time it takes to run it will decrease.
for($i=0;$i<=count($data)-1;$i++)
{
$amount200a=0;
$amount200p=0;
$amount50a=0;
$amount50p=0;
$distance;
for($_i=0;$_i<=count($data)-1;$_i++)
{
$distance=0;
if($data[$i][0]===$data[$_i][0])
{
}
else
{
//echo "Comparing ".$data[$i][0]." and ".$data[$_i][0]." ";
$lat_a = $data[$i][1] * PI()/180;
$lat_b = $data[$_i][1] * PI()/180;
$long_a = $data[$i][2] * PI()/180;
$long_b = $data[$_i][2] * PI()/180;
$distance =
acos(
sin($lat_a ) * sin($lat_b) +
cos($lat_a) * cos($lat_b) * cos($long_b - $long_a)
) * 6371;
$distance*=1000;
if ($distance<=50)
{
$amount50a++;
$amount200a++;
}
else if ($distance<=200)
{
$amount200a++;
}
}
}
$amount200p=100*number_format($amount200a/count($data),2,'.','');
$amount50p=100*number_format($amount50a/count($data),2,'.','');
/*
$dist[$i][0]=$data[$i][0];
$dist[$i][1]=$amount200a;
$dist[$i][2]=$amount200p;
$dist[$i][3]=$amount50a;
$dist[$i][4]=$amount50p;
//*/
$dist.=$data[$i][0]."&&".$amount200a."&&".$amount200p."&&".$amount50a."&&".$amount50p."%%";
}
Index 0 contains the unique ID of each node, 1 contains the latitude of each node and index 2 contains the longitude of each node.
The error occurs at the second for loop inside the first loop. This loop is the one comparing the selected map node to other nodes. I am also using the Haversine Formula.
first of all, you are performing in big O notation: O(data^2), which is gonna be slow as hell , and really, either there are 2 possible solutions. Find a proven algorithm that solves the same problem in a better time. Or if you cant, start moving stuff out of the innner for loop, and mathmatically prove if you can convert the inner for loop to mostly simple calculations, which is often something you can do.
after some rewriting, I see some possiblities: If $data is not a SPLFixedArray (which has a FAR Better access time, ) then make it. since you are accessing that data so many times (4000^2)*2. secound, write cleaner code. although the optizmier will do its best, if you dont try either to minize the code (which only makes it more readable), then it might not be able to do it as well as possible.
and move intermediate results out of the loops, also something like the size of the array.
Currently you're checking all points against all other points, where in fact you only need to check the current point against all remaining points. The distance from A to B is the same as the distance from B to A, so why calculate it twice?
I would probably make an adjacent array that counts how many nodes are within range of each other, and increment pairs of entries in that array after I've calculated that two nodes are within range of each other.
You should probably come up with a very fast approximation of the distance that can be used to disregard as many nodes as possible before calculating the real distance (which is never going to be super fast).
Generally speaking, beyond algorithmic optimisations, the basic rules of optimisation are:
Don't any processing that you don't have to do: Like not multiplying $distance by 1000. Just change the values you're testing against from 20 and 50 to 0.02 and 0.05, respectively.
Don't call any function more often than you have to: You only need to call count($data) once before any processing starts.
Don't calculate constant values more than once: PI()/180
, for example.
Move all possible processing outside of loops. I.e. precalculate as much as possible.
Another minor point which will make your code a little easier to read:
for( $i = 0; $i <= count( $data ) - 1; $i++ )
is the same as:
for( $i = 0; $i < count( $data ); $i++ )
Try this:
$max = count($data);
$CONST_PI = PI() / 180;
for($i=0;$i<$max;$i++)
{
$amount200a=0;
$amount50a=0;
$long_a = $data[$i][2] * $CONST_PI;
$lat_a = $data[$i][1] * $CONST_PI;
for($_i=0;$_i<=$max;$_i++)
//or use for($_i=($i+1);$_i<=$max;$_i++) if you did not need to calculate already calculated in other direction
{
$distance=0;
if($data[$i][0]===$data[$_i][0]) continue;
$lat_b = $data[$_i][1] * $CONST_PI;
$long_b = $data[$_i][2] * $CONST_PI;
$distance =
acos(
sin($lat_a ) * sin($lat_b) +
cos($lat_a) * cos($lat_b) * cos($long_b - $long_a)
) * 6371;
if ($distance<=0.2)
{
$amount200a++;
if ($distance<=0.05)
{
$amount50a++;
}
}
} // for %_i
$amount200p=100*number_format($amount200a/$max,2,'.','');
$amount50p=100*number_format($amount50a/$max,2,'.','');
$dist.=$data[$i][0]."&&".$amount200a."&&".$amount200p."&&".$amount50a."&&".$amount50p."%%";
} // for $i
It will be better to read I think and if you change the commented out line of the for $_i it will be faster at all :)