I'm working on a class which makes sorting of arrays easier in PHP and I've been playing with the SORT_ constants, however the behaviour or SORT_REGULAR
(the default sort type) seems to differ depending what order you add the items in your array. Moreover, I can't spot a pattern as to why this might be the case.
Array items:
$a = '0.3';
$b = '.5';
$c = '4';
$d = 'F';
$e = 'z';
$f = 4;
Scenario 1:
sort(array($d, $e, $a, $f, $b, $c));
// Produces...
array(6) {
[0]=>
string(3) "0.3"
[1]=>
string(2) ".5"
[2]=>
string(1) "4"
[3]=>
string(1) "F"
[4]=>
string(1) "z"
[5]=>
int(4)
}
Scenario 2:
sort(array($d, $e, $b, $f, $c, $a));
// Produces...
array(6) {
[0]=>
string(3) "0.3"
[1]=>
string(2) ".5"
[2]=>
string(1) "F"
[3]=>
string(1) "z"
[4]=>
int(4)
[5]=>
string(1) "4"
}
Any ideas?
Warning
Be careful when sorting arrays with mixed types values because sort() can produce unpredictable results.
You should use one of the SORT_* constants.
There are a few comments here:
This behavior is "expected" (or at least known) because you use different types for the values (strings and integers). See the manual of the sort()
function.
Warning
Be careful when sorting arrays with mixed types values because sort() can produce unpredictable results.
It is most likely that, at some point in the sorting algorithm, it compares two values as integers and not as strings. To avoid this situation don't try to sort arrays with different types (as the manual say).
<
.It seems that SORT_REGULAR
follows the same rules as the <
operator (and thus, the >
operator).
In my own tests, for any two values $v0 and $v1, the following assertion did pass:
$pair = [$v0, $v1];
sort($pair);
assert($pair[0] < $pair[1]);
<
is not a strict weak order.Unfortunately, <
with a mix of strings and integers has a circular behavior, and is not transitive. Thus, it is not a Strict weak order
This can be shown by the following assertions, which pass:
assert('3' < '10'); // Numeric comparison. 2 < 12.
assert('10' < '2 '); // Lexicographical comparison
// Circular:
assert('2 ' < '3'); // Lexicographical comparison
// Not transitive:
assert(!('3' < '2 ')); // Lexicographical comparison
// And just because it's interesting:
assert('2 ' < 3); // Numeric comparison. 2 < 3.
The idea of sorting is that in the sorted list of items, if $i < $j
, then !($sorted_items[$j] < $sorted_items[$i])
.
This is only possible if <
is a strict total order, or a strict weak order.
<
reads strings as numbers.The following rules seem to apply for <
:
(*) The string '4' does "look like" the number 4. The string '4 x' does not "look like" the number 4, but (int)'4 x'
still evaluates to 4.
The sort in this specific example is actually consistent in PHP 7, just not in PHP 5.*, as can be seen here.
This does not mean that PHP 7 is immune to this problem. It just happens to be ok in this specific example.
Comparing the individual values reveals:
hhvm-3.9.1 - 3.12.0, 7.0.0 - 7.1.0alpha2
'0.3' < '.5'
'0.3' < '4'
'0.3' < 'F'
'0.3' < 'z'
'0.3' < 4
'.5' < '4'
'.5' < 'F'
'.5' < 'z'
'.5' < 4
'4' < 'F'
'4' < 'z'
'4' < 4
'F' < 'z'
'F' < 4
'z' < 4
5.5.0 - 5.6.23
'0.3' < '.5'
'0.3' < '4'
'0.3' < 'F'
'0.3' < 'z'
'0.3' < 4
'.5' < '4'
'.5' < 'F'
'.5' < 'z'
'.5' < 4
'4' < 'F'
'4' < 'z'
4 < '4'
'F' < 'z'
'F' < 4
'z' < 4
Now let's remove the boring bits.
In PHP 7:
'F' < 'z'
remains. Nothing circular. Hence, there is exactly one sort order possible: ['0.3', '.5', '4', 'F', 'z', 4]
.In PHP 5.*:
There are multiple ways to build circles with these results.
One such circle is: '4' < 'F'
, 'F' < 4
, 4 < '4'
.
This means: In any sort order, there will be at least two positions $i
and $j
with $i < $j
but $sorted[$j] < $sorted[$i]
.