I have to files, one is full of keywords sequences (~20k lines), the other is full of regular expression (~2.5k).
I want to test each keyword with each regexp and print the one that matches. I tested my files and that makes around 22 750 000 tests. I am using the following code :
$count = 0;
$countM = 0;
foreach ($arrayRegexp as $r) {
foreach ($arrayKeywords as $k) {
$count++;
if (preg_match($r, $k, $match) {
$countM++;
echo $k.' matched with keywords '.$match[1].'<br/>';
}
}
}
echo "$count tests with $countM matches.";
Unfortunately, after computing for a while, only parts of the actual matches are displayed and the final line keeping the counts never displays. What is even more weird is that if I comment the preg section to keep only the two foreach and the count display, everything works fine.
I believe this is due to an excessive amount of data to be processed but I would like to know if there is recommendations I didn't follow for that kind of operations. The regular expressions I use are very complicated and I cannot change to something else.
Ideas anyone?
Increase execution time
usethis line in .htaccess
php_value max_execution_time 80000
There are two optimization options:
/(regex1|regex2|...)/
. Oftentimes PCRE can evaluate alternatives faster than PHP can execute a loop.As example:
$rx = implode("|", $arrayRegexp); // if it hasn't /regexp/ enclosures
preg_replace_callback("#($rx)#", "print", $arrayKeywords);
But define a custom print function to output and count the results, and let it just return e.g. an empty string.
Come to think of it, preg_replace_callback would also take an array of regular expressions. Not sure if it cross-checks each regex on each string though.