I have an set numeric ranges that I would like to optimize.
Here's a simple example of initial values:
Start End
9 12
1 2
60 88
10 11
79 80
What I'd expect as output after optimization:
Start End
1 2
9 12
60 88
These are the left
and right
values from Modified Preorder Tree Traversal (Nested Set) data stored in a MySQL database. I use them to exclude inactive branches from the result, and am not currently optimizing the ranges at all. I thought I might get a performance gain from optimizing the ranges before use.
MORE INFO
The values are passed into a query for exclusion of the inactive branches in the tree using a NOT BETWEEN
clause. I thought that I could optimize the performance of that query by using a minimal set of ranges.
Here is an SQL that will return what you want
mysql> CREATE TABLE sample (Start INT, End INT);
mysql> INSERT sample VALUES (9,12),(1,2),(60,88),(10,11),(79,80);
mysql> SELECT *
-> FROM sample s
-> WHERE NOT EXISTS (SELECT 1
-> FROM sample
-> WHERE s.Start > Start AND s.Start < End);
+-------+------+
| Start | End |
+-------+------+
| 9 | 12 |
| 1 | 2 |
| 60 | 88 |
+-------+------+
You can, of course, create VIEW, move the data to another table or delete rows using the above SQL.
NOTE: I am not really sure why are you doing this 'optimization'.
EDIT:
The query can be rewritten as
SELECT s.*
FROM sample s LEFT JOIN
sample s2 ON s.Start > s2.Start AND s.Start < s2.End
WHERE s2.start IS NULL;
Which will create different execution plan (2xsimple select vs primary/dependent subquery for EXISTS), so performance might be different. Both queries will use an index on (Start, End) if it exists.
Put them in a sorted list. Mark which elements in the sorted list represent range starts and which are range ends. Sort the list based on value first; however, make sure that range starts come before range ends. (This will probably involve a structure of some sort that can be sorted on a given key. I don't know the details in php.)
Now, traverse the list from start to end. Keep a counter, c
. When you pass a range start, increment c
. When you pass a range end, decrement c
.
When c
goes from 0 to 1, that's the start of a range in the final set. When c
goes from 1 to 0, that's the end of a range.
EDIT:: If you already have the ranges in a database table somewhere, you can probably structure an SQL query to do the first step above (again, making sure that range start-points are returned before range end-points).
Here's a simple implementation:
// I picked this format because it's convenient for the solution
// and because it's very natural for a human to read/write
$ranges = array(
9 => 12,
1 => 2,
60 => 81,
10 => 11,
79 => 88);
ksort($ranges);
$count = count($ranges);
$prev = null; // holds the previous start-end pair
foreach($ranges as $start => $end) {
// If this range overlaps or is adjacent to the previous one
if ($prev !== null && $start <= $prev[1] + 1) {
// Update the previous one (both in $prev and in $ranges)
// to the union of its previous value and the current range
$ranges[$prev[0]] = $prev[1] = max($end, $prev[1]);
// Mark the current range as "deleted"
$ranges[$start] = null;
continue;
}
$prev = array($start, $end);
}
// Filter all "deleted" ranges out
$ranges = array_filter($ranges);
Limitations/notes:
int
.0
. If your data can legitimately contain such a range, provide an appropriate callback to array_filter
: function($item) { return $item === null; }
.$ranges = array(
array(9, 12),
array(1, 2),
array(60, 81),
array(10, 11),
array(79, 88),
);
function optimizeRangeArray($r) {
$flagarr = array();
foreach ($r as $range => $bounds) {
$flagarr = array_pad($flagarr, $bounds[1], false);
for ($i = $bounds[0]-1; $i < $bounds[1]; $i++) $flagarr[$i] = true;
}
$res = array(); $min = 0; $max = 0; $laststate = false;
$ctr = 0;
foreach ($flagarr as $state) {
if ($state != $laststate) {
if ($state) $min = $ctr + 1;
else {
$max = $ctr;
$res[] = array($min, $max);
}
$laststate = $state;
}
$ctr++;
}
$max = $ctr;
$res[] = array($min, $max);
return($res);
}
print_r(optimizeRangeArray($ranges));
Output:
Array
(
[0] => Array
(
[0] => 1
[1] => 2
)
[1] => Array
(
[0] => 9
[1] => 12
)
[2] => Array
(
[0] => 60
[1] => 88
)
)
Note: This doesn't work for negative integers!
Or use it like this
$rres = optimizeRangeArray($ranges);
$out = "<pre>Start End<br />";
foreach($rres as $range=>$bounds) {
$out .= str_pad($bounds[0], 9, ' ') . str_pad($bounds[1], 9, ' ') . "<br />";
}
$out .= "</pre>";
echo $out;
To get this in your browser
Start End
1 2
9 12
60 88