比较字符串中关键字的流行度

I want to take a long string (hundreds of thousands of characters) and to compare it against an array of keywords to determine which one of the keywords in the array is mentioned more than the rest.

This seems pretty easy, but I am a bit worried about strstr under performing for this task.

Should I do it in a different way?

Thanks,

I think you can do it in a different way, with a single scan, and if you do it the right way, it can give you a dramatic improvement as of performance.

Create an associative array, where keys are the keywords and values are the occurrences.

Read the string word by word, I mean take a word and put it in a variable. Then, compare it against all the keywords (there are several ways to do it, you can query the associative array with isset). When a keyword is found, increment its counter.

I hope PHP implements associative arrays with some hashmap-like thingie...

Parse the words out in linear fashion. For each word you encounter, increment its count in the associative array of words you are looking for (skipping those you aren't interested in, of course). This will be much faster than strstr.