In my situation, I have to process large sets of data consisting of dates with corresponding numbers for every date. The intervals between these dates are not static, meaning that some dates are seconds apart while others are days apart. This data has the following format: (Given in PHP array dump format)
Array[2000] {
Array[2] {
Date: 2014-7-7 7:07:07
Data: 29.2934
}
Array[2] {
Date: 2014-7-7 7:08:13
Data: 30.10203
}
Array[2] {
Date: 2014-7-9 3:24:43
Data: 30.10203
}
...
}
I need to find the largest increase in Data between two Dates that fall within some time constraint, such as one day, one week, etc. I've been using this PHP code to accomplish that:
for ($i=0; $i<$numrows-1; $i++) {
for($o=1; $o<($numrows-1)-$i; $o++) {
if((strtotime($dataArray[$i+$o]['Date'])-strtotime($dataArray[$i]['Date']))<86400) { //86400 for constraint of one day
$diffs[$i]['date'] = strtotime($dataArray[$i+$o]['Date']);
$diffs[$i]['data'] = $dataArray[$i+$o]['data']-$dataArray[$i]['Data'];
}
}
}
This returns an array containing the largest end date within the bound for each element of the data array as well as the difference in number between them. I then can search for the greatest data element of the diffs array to determine during which day data increased the most. This works fine, but unfortunately it involves putting my server's CPU at 100% for over 20 seconds for some data sets and causing it to become unresponsive to other requests during this time.
What I'm trying to figure out is how to do this in a more efficient manner; I've done some research and I think this is in O(n^2) time, however I can't seem to find a method that uses a more efficient algorithm. Is there any way to accomplish this same objective using less processing time and system resources?
With a database it would be much easier to implement. Anyway...
I'd add an index (that is an additional array that keeps references to the original data) ordered by delta date and delta data.
Array[2000] {
Array[2] {
DeltaDate: 10
DeltaData: 1.234
Row: ???
}
Array[2] {
DeltaDate: 66
DeltaData: 0.80863
Row: 1
}
Array[2] {
Date: 160000
Data: 0
Row: 2
}
Array[2] {
Date: 160000
Data: 1
Row: ???
}
Array[2] {
Date: 160000
Data: 234
Row: ???
}
...
}
After that I'll lookup in that index the highest record satisfying the time constraint. Given that the index is ordered, you could use a binary search (O=log N).
In an SQL database this would be expressed with the following query:
SELECT * FROM Array
WHERE DeltaDate <= time constraint
ORDER BY DeltaData DESC LIMIT 1