Here is the task. I need to recognize whether a string contains some town name. Another words - a recognition of a town from some text.
As input i have text to search against AND geocode. Depending on geocode list of towns are loaded from db.
Now, current implementations is i loop over list of those towns and try to match it with the use of short circuit evaluation. Like:
if (stripos($text, $currentTown) !== false &&
preg_match("#\b$currentTown\b#i", $text)) {
// add town to recognized list
}
And the problem is i have e.g. list of towns for UK (which is about 40 000) the loop will take "quite a while".
So my question is how do i optimize the recognition time. Maybe there is some advanced search in the array?
Any ideas are welcome.
Thanks.
Although my best bet instantly was to use 'MySQL full text search' I will attempt to solve your problem. I will try to start with 'best results'.
Keep all your town data in lowercase (or atleast where you search in) and use $text = strtolower($text);
before searching: so you can use strpos
Case sensitive search > insensitive search
Why bother with preg_match(); as your doing 99% the same thing with stripos. You can skip it.
Perhaps add small checks like if strlen($text) < 4 don't even try to search as it gives horrible results.
Order your data by length (this is super expensive so do this once and store it) and skip the currentTowns that are shorter than the input.
Order your data alphabetical and only go through the part which matches the first letter (or first + second even)
Possibly, cache results / searches. Then you only have to search through your cache if it can find some row (but ye cache miss hurts)
If you have large data sets, maybe the PHP Iterator class can help out. It could speed up the process of going over each record.