I need a way to strip all literals from PHP files. My current regexp solution works fine when there is no nested quotes in the string. Tried updating it to handle escaped quotes as well, which did work in most cases, except when there are escaped escape characters in the string.
This is what it should be able to handle, if this should be done correctly
"text"
"\"text\""
"\\"
"\"\\\""
So as I see it, it needs to handle cases where there are an even amount of escape characters and cases where there are an uneven amount. But how do you get this into regexp?
Update
I want to clean up PHP files to make them easier to search through and index different parts, something for a small project that I am playing with. Since literals can contain mostly anything, they can also contain data similar to some of the searches. So I want to remove anything in the files that is wrapped in " or '.
"/\"[^\"]*\"/"
This will work unless there is a nested quote "\"data\"".
"/\"(\\\\\"|[^\"])*\"/"
This will work unless there is "\\"
This is what I need
$var = "...";
Becomes
$var = ;
You could use this regular expression based substitution:
Find: ((?<!\\)(?:\\.)*)(["'])(?:\\.|(?!\2).)*?\2
Replace: $1
Note that if you are going to use this regular expression in PHP (where you encode it as a string literal) you need to escape the backslashes and quote in that regular expression, so like this:
preg_replace("~((?<!\\\\)(?:\\\\.)*)([\"'])(?:\\\\.|(?!\\2).)*?\\2~s", "$1", $input);
As PHP string literals can span multiple lines, the s
modifier is added so that .
matches newline characters also.
See it run on eval.in
NB: You'll need to think about heredoc notation also...