php regex:使用引号进行匹配,但不捕获它们

I'm unsure if I should be using preg_match, preg_match_all, or preg_split with delim capture. I'm also unsure of the correct regex.

Given the following:

$string = " ok 'that\\'s cool' \"yeah that's \\\"cool\\\"\"";

I want to get an array with the following elems:

[0] = "ok"
[1] = "that\'s"
[2] = "yeah that's \"cool\""

You can not do this with a regular expression because you're trying to parse a non-context-free grammar. Write a parser.

Outline:

  • read character by character, if you see a \ remember it.
  • if you see a " or ' check if the previous character was \. You now have your delimiting condition.
  • record all the tokens in this manner

Your desired result set seems to trim spaces, you also lost a couple of the \s, perhaps this is a mistake but it can be important.

I would expect:

[0] = " ok " // <-- spaces here
[1] = "that\\'s cool"
[2] = " \"yeah that's \\\"cool\\\"\"" // leading space here, and \" remains

Actually, you might be surprised to find that you can do this in regex:

preg_match_all("((?|\"((?:\\\\.|[^\"])+)\"|'((?:\\\\.|[^'])+)'|(\w+)))",$string,$m);

The desired result array will be in $m[1].

You can do it with a regex:

$pattern = <<<'LOD'
~
(?J) 

# Definitions #
(?(DEFINE)
  (?<ens> (?> \\{2} )+ ) # even number of backslashes

  (?<sqc> (?> [^\s'\\]++  | \s++ (?!'|$)    | \g<ens> | \\ '?+    )+ ) # single quotes content
  (?<dqc> (?> [^\s"\\]++  | \s++ (?!"|$)    | \g<ens> | \\ "?+    )+ ) # double quotes content
  (?<con> (?> [^\s"'\\]++ | \s++ (?!["']|$) | \g<ens> | \\ ["']?+ )+ ) # content
)
# Pattern #
    \s*+ (?<res> \g<con>)
| ' \s*+ (?<res> \g<sqc>) \s*+ '?+
| " \s*+ (?<res> \g<dqc>) \s*+ "?+ 
~x
LOD;
$subject = " ok 'that\\'s cool' \"yeah that's \\\"cool\\\"\"";

preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);
foreach($matches as $match) {
    var_dump($match['res']);
}

I made the choice to trim spaces in all results, then " abcd " will give abcd. This pattern allows all backslashes you want, anywhere you want. If a quoted string is not closed at the end of the string, the end of the string is considered as the closing quote (this is why i have made the closing quotes optional). So, abcd " ef'gh will give you abcd and ef'gh