php匹配字符串到多个关键字数组

I'm writing a basic categorization tool that will take a title and then compare it to an array of keywords. Example:

$cat['dining'] = array('food','restaurant','brunch','meal','cand(y|ies)');
$cat['services'] = array('service','cleaners','framing','printing');
$string = 'Dinner at seafood restaurant';

Are there creative ways to loop through these categories or to see which category has the most matches? Note that in the 'dining' array, I have regex to match variations on the word candy. I tried the following, but with these category lists getting pretty long, I'm wondering if this is the best way:

$keywordRegex = implode("|",$cat['dining']); 
preg_match_all("/(\b{$keywordRegex}\b)/i",$string,$matches]);

Thanks, Steve

EDIT: Thanks to @jmathai, I was able to add ranking:

    $matches = array(); 
    foreach($keywords as $k => $v) {
        str_replace($v, '#####', $masterString,$count);
        if($count > 0){
            $matches[$k] = $count;
        }
    }
    arsort($matches);

This can be done with a single loop.

I would split candy and candies into separate entries for efficiency. A clever trick would be to replace matches with some token. Let's use 10 #'s.

$cat['dining'] = array('food','restaurant','brunch','meal','candy','candies');
$cat['services'] = array('service','cleaners','framing','printing');
$string = 'Dinner at seafood restaurant';

$max = array(null, 0); // category, occurences
foreach($cat as $k => $v) {
  $replaced = str_replace($v, '##########', $string);
  preg_match_all('/##########/i', $replaced, $matches);
  if(count($matches[0]) > $max[1]) {
    $max[0] = $k;
    $max[1] = count($matches[0]);
  }
}

echo "Category {$max[0]} has the most ({$max[1]}) matches.
";
$cat['dining'] = array('food','restaurant','brunch','meal');
$cat['services'] = array('service','cleaners','framing','printing');
$string = 'Dinner at seafood restaurant';

$string = explode(' ',$string);
foreach ($cat as $key => $val) {
  $kwdMatches[$key] = count(array_intersect($string,$val));
}
arsort($kwdMatches);

echo "<pre>";
print_r($kwdMatches);

You are performing O(n*m) lookup on n being the size of your categories and m being the size of a title. You could try organizing them like this:

const $DINING = 0;
const $SERVICES = 1;

$categories = array(
    "food" => $DINING,
    "restaurant" => $DINING,
    "service" => $SERVICES,
);

Then for each word in a title, check $categories[$word] to find the category - this gets you O(m).

Providing the number of words is not too great, then creating a reverse lookup table might be an idea, then run the title against it.

// One-time reverse category creation
$reverseCat = array();    
foreach ($cat as $cCategory => $cWordList) {
   foreach ($cWordList as $cWord) {
       if (!array_key_exists($cWord, $reverseCat)) {
           $reverseCat[$cWord] = array($cCategory);
       } else if (!in_array($cCategory, $reverseCat[$cWord])) {
           $reverseCat[$cWord][] = $cCategory;
       }
   }
}

// Processing a title
$stringWords = preg_split("/\b/", $string);

$matchingCategories = array();
foreach ($stringWords as $cWord) {
   if (array_key_exists($cWord, $reverseCat)) {
       $matchingCategories = array_merge($matchingCategories, $reverseCat[$cWord]);
   }
}

$matchingCategories = array_unique($matchingCategories);

Okay here's my new answer that lets you use regex in $cat[n] values...there's only one caveat about this code that I can't figure out...for some reason, it fails if you have any kind of metacharacter or character class at the beginning of your $cat[n] value.

Example: .*food will not work. But s.afood or sea.* etc... or your example of cand(y|ies) will work. I sort of figured this would be good enough for you since I figured the point of the regex was to handle different tenses of words, and the beginnings of words rarely change in that case.

function rMatch ($a,$b) {
  if (preg_match('~^'.$b.'$~i',$a)) return 0;
  if ($a>$b) return 1;
  return -1;
}

$string = explode(' ',$string);
foreach ($cat as $key => $val) {
  $kwdMatches[$key] = count(array_uintersect($string,$val,'rMatch'));
}
arsort($kwdMatches);

echo "<pre>";
print_r($kwdMatches);