Preg Patterns,用于忽略转义字符

I want to create a RegEx that finds strings that begin and end in single or double quotes.

For example I can match such a case like this:

String: "Hello World"
RegEx: /[\"\'][^\"\']+[\"\']/

However, the problem occurs when quotes appear in the string itself like so:

String: "Hello" World"

We know the above expression will not work.

What I want to be able to do, it to have the escape within the string itself, since that will be functionality required anyway:

String: "Hello\" World"

Now I could come up with a long and complicated expression with various patterns in a group, one of them being:

RegEx: /[\"\'][^\"\']+(\\\"|\\\')+[^\"\']+[\"\']/

However that to me seems excessive, and I think there may be a shorter and more elegant solution.

Intended syntax:

run arg1 "arg1" "arg3 with \"" "\"arg4" "arg\"\"5"

As you can see, the quotes are really only used to make sure that string with spaces are counted as a single string. Do not worry about arg1, I should be able to match unquoted arguments.

I will make this easier, arguments can only be quoted using double-quotes. So i've taken single quotes out of the requirements of this question.

I have modified Rui Jarimba's example:

/(?<=")(\\")*([^"]+((\\(\"))*[^"])+)((\\"")|")/

This now accounts pretty well for most cases, however there is one final case that can defeat this:

run -a "arg3 \" p2" "\"sa\"mple\"\\"

The second argument end with \\" which is a conventional way in this case to allow a backslash at the end of a nested string, unfortunately the regex thinks this is an escaped quote since the pattern \" still exists at the end of the pattern.

Try this regex:

['"]([^'"]+((\\(\"|'))*[^'"])+)['"]

Given the following string:

"Hello" World 'match 2' "wqwqwqwq wwqwqqwqw" no match here oopop "Hello \" World"

It will match

"Hello"
'match 2'
"wqwqwqwq wwqwqqwqw"
"Hello \" World"

Firstly, please use ' strings to write your regexes. That saves you a lot of escaping.

Then I see two possibilities. The problem with your attempt is, it allows only consecutive escaped quotes in one place in the string. Also, this allows the use of different quotes at the beginning and the end. You could use a backreference to get around that. So this would be a) slightly more elegant and b) correct:

$pattern = '/(["\'])(\\"|\\\'|[^"\'])+\1/';

Note that the order of the alternation is important!

The problem with this is, you don't want to escape the quote that you don't use to delimit the string. Therefore, the other possibility is to use lookarounds (since backreferences cannot be used inside character classes):

$pattern = '/(["\'])(?:(?!\1).|(?<=\\\\)\1)+\1/';

Note that four consecutive backslashes are always necessary to match a single literal backslash. That is because in the actual string $pattern they end up as \\ and then the regex engine "uses" the first one to escape the second one.

This will match either an arbitrary character if it is not the starting quote. Or it will match the starting quote if the previous character was a backslash.

Working demo.

This by the way is equivalent to:

$pattern = '/(["\'])(?:\\\\\1|(?!\1).)+\1/';

But here you have to write the alternation in this order again.

Working demo.

One final note. You can avoid the backreference by providing the two possible strings separately (single and double quoted strings):

$pattern = '/"(?:\\\\"|[^"])+"|\'(?:\\\\\'|[^\'])+\'/';

But you said you were looking for something short and elegant ;) (although, this last one might be more efficient... but you'd have to profile that).

Note that all my regexes leave one case unconsidered: escaped quotes outside of quoted strings. I.e. Hello \" World "Hello" World will give you " World". You can avoid this using another negative lookbehind (using as an example the second regex for which I provided a working demo; it would work the same for all others):

$pattern = '/(?<!\\\\)(["\'])(?:\\\\\1|(?!\1).)+\1/';