I am not very good at regular expressions.
I have various files that have a repeated string inside them:
$find = "><script contentType=\"application/x-javascript\"
>
if(event.target.hostContainer)";
But sometimes instead of the 2 you can see in the above string, there is sometimes 3 or 1. Granted, it's a stupid problem to have to overcome but unfortuantely the file is a pdf... soo i don't have control over its output.
How might i go about searching for the above string while ignoring the .
The context of my question is:
$file = file_get_contents('pdfs/another1.pdf');
$find = "><script contentType=\"application/x-javascript\"
>
if(event.target.hostContainer)";
$replace = "whatever bla bla";
$output_str = str_replace($find, $replace, $file);
For one thing, str_replace
doesn't use regular expressions for the search string. The correct function is preg_replace
.
Here's a regex that works in this case:
$find = '#><script contentType="application/x-javascript"\s*>\s*if\(event\.target\.hostContainer\)#U';
$output_str = preg_replace($find, $replace, $file);
The regex has a lot of "\" (escape) characters because ".", "(", and ")" have special meaning in regex. The regex is enclosed in the '#' delimiter. The 'U' modifier at the end of the regex is a precaution so that if the string has more than one matching expression, each match gets replaced with the replacement.
A complete explanation of PHP regex is available here: http://us1.php.net/manual/en/reference.pcre.pattern.syntax.php