按字母范围分组数组元素

I have a simple array with some names in it and I want to group them by their first letter. E.g. all names with A to C as first letter go in an array and D to F go to another one and so on.

Is there a better way to do it than using lots of if else?

I now have four methods to offer. All can be modified to allow for larger or smaller groups by changing $size.

  • 2 creates "AB","CD",etc.
  • 3 creates "ABC","DEF",etc.
  • 4 creates "ABCD","EFGH",etc.
  • 15 creates "ABCDEFGHIJKLMNO","PQRSTUVWXYZ"

Code#1 processes the values as an array by using 2 foreach() loops and a comparison on the first character of each value. This is the easiest to comprehend.

$fruits=array("date","guava","lemon","Orange","kiwi","Banana","apple");
natcasesort($fruits);  // pre-sort them for alphabetized output
$size=3;  // <-modify group sizes here
$chunks=array_chunk(range('A','Z'),$size);  // 0=>["A","B","C"],1=>["D","E","F"],etc...
foreach($fruits as $fruit){
    foreach($chunks as $letters){
        if(in_array(strtoupper($fruit[0]),$letters)){  // check if captialized first letter exists in $letters array
            $groups[implode($letters)][]=$fruit;  // push value into this group
            break;  // go to next fruit/value
        }
    }
}
var_export($groups);

Code#2 integrates apokryfos' very clever ord() line with Code#1 to eliminate the non-matching iterations of the inner loop (and the inner loop itself). This delivers improvement on efficiency, but a negative impact on readability.

$fruits=array("date","guava","lemon","Orange","kiwi","Banana","apple");
natcasesort($fruits);  // pre-sort them for alphabetized output
$size=3;  // <-modify group sizes here
$chunks=array_chunk(range('A','Z'),$size);  // 0=>["A","B","C"],1=>["D","E","F"],etc...
foreach($fruits as $fruit){
    $groups[implode($chunks[floor((ord(strtoupper($fruit[0]))-ord("A"))/$size)])][]=$fruit;
}
var_export($groups);

Code#3 processes the values as a csv string by using preg_match_all() and some filtering functions. This assumes that no values include commas in them. In my opinion, this code is hard to comprehend at a glance because of all of the functions and the very long regex pattern.

$fruits=array("date","guava","lemon","Orange","kiwi","Banana","apple");
natcasesort($fruits);  // pre-sort them for alphabetized output  // array(6 => 'apple',5 => 'Banana',0 => 'date',1 => 'guava',4 => 'kiwi',2 => 'lemon',3 => 'Orange')
$size=3;  // <-modify group sizes here
$chunks=str_split(implode(range('A','Z')),$size);  // ['ABC','DEF','GHI','JKL','MNO','PQR','STU','VWX','YZ']
$regex="/((?<=^|,)[".implode('][^,]*)|((?<=^|,)[',$chunks)."][^,]*)/i";  // '/((?<=^|,)[ABC][^,]*)|((?<=^|,)[DEF][^,]*)|((?<=^|,)[GHI][^,]*)|((?<=^|,)[JKL][^,]*)|((?<=^|,)[MNO][^,]*)|((?<=^|,)[PQR][^,]*)|((?<=^|,)[STU][^,]*)|((?<=^|,)[VWX][^,]*)|((?<=^|,)[YZ][^,]*)/i'
if(preg_match_all($regex,implode(",",$fruits),$out)){
    $groups=array_map('array_values',   // 0-index subarray elements
        array_filter(                   // omit empty subarrays
            array_map('array_filter',   // omit empty subarray elements
                array_combine($chunks,  // use $chunks as keys for $out
                    array_slice($out,1) // remove fullstring subarray from $out
                )
            )
        )
    );
    var_export($groups);
}

Code#4 processes the values as an array without loops or conditionals by using: array_map(), preg_grep(), array_values(), array_combine(), and array_filter to form a one-liner *discounting the $size & $chunks declarations. ...I don't like to stop until I've produced a one-liner -- no matter how ugly. ;)

$fruits=array("date","guava","lemon","Orange","kiwi","Banana","apple");
natcasesort($fruits);  // pre-sort them for alphabetized output
$size=3;  // <-modify group sizes here
$chunks=str_split(implode(range('A','Z')),$size);  // ['ABC','DEF','GHI','JKL','MNO','PQR','STU','VWX','YZ']
$groups=array_filter(array_combine($chunks,array_map(function($v)use($fruits){return array_values(preg_grep("/^[$v].*/i",$fruits));},$chunks)));
var_export($groups);


// $groups=array_filter(  // remove keys with empty subarrays
//            array_combine($chunks,  // use $chunks as keys and subarrays as values
//                array_map(function($v)use($fruits){ // check every chunk
//                    return array_values(  // reset subarray's keys
//                        preg_grep("/^[$v].*/i",$fruits)  // create subarray of matches
//                    );
//                },$chunks)
//            )
//        );

All codes output an identical result:

array (
  'ABC' => 
  array (
    0 => 'apple',
    1 => 'Banana',
  ),
  'DEF' => 
  array (
    0 => 'date',
  ),
  'GHI' => 
  array (
    0 => 'guava',
  ),
  'JKL' => 
  array (
    0 => 'kiwi',
    1 => 'lemon',
  ),
  'MNO' => 
  array (
    0 => 'Orange',
  ),
)

You can do this:

function buckets($array, callable $bucketFunc) {
    $buckets = [];

    foreach ($array as $val) {
        $bucket = $bucketFunc($val);
        if (!isset($buckets[$bucket])) {
            $buckets[$bucket] = [];
        }
        $buckets[$bucket][] = $val;
    }
    return $buckets;
}

function myBucketFunc($value) {
      //Gets the index of the first character and returns which triple of characters it belongs to
      return floor((ord(ucfirst($value)) - ord("A"))/3); 
}
$array = [ "Abc", "Cba", "Foo","Hi", "Bar" ];

$buckets = buckets($array, 'myBucketFunc');//Any function would 

Would return:

Array
(
    [0] => Array
        (
            [0] => Abc
            [1] => Cba
            [2] => Bar
        )

    [1] => Array
        (
            [0] => Foo
        )

    [2] => Array
        (
            [0] => Hi
        )

)

Further clarification:

ord returns the ASCII value of a character.

Doing ord("X") - ord("A") would return the letter index of X.

Dividing that letter index by 3 would return the bucket number of X if we split the alphabet into buckets of 3 letters each.

This is a good use of array_reduce in a non-scalar fashion:

function keyize(string $word, $stride = 3): string {
    $first = strtoupper($word{0});
    $index = (int)floor((ord($first) - ord('A'))/$stride);
    return implode('', array_chunk(range('A', 'Z'), $stride)[$index]);
}

function bucketize(array $words, $stride = 3): array {
    return array_reduce(
        $words,
        function ($index, $word) use ($stride) {
            $index[keyize($word, $stride)][] = $word;
            return $index;
        },
        []
    );
}

$words = [ 'alpha', 'Apple', 'Bravo', 'banana', 'charlie', 'Cucumber', 'echo', 'Egg', ];
shuffle($words);
$buckets = bucketize($words, 3); // change the number of characters you want grouped, eg 1, 13, 26
ksort($buckets);
var_dump($buckets);

So we're using array_reduce to walk - and simultaneously build - the buckets. It's not the most efficient as implemented, because the bucket array is copied through each closure invocation. However, it's compact.