获取所有没有特定前缀组的单词

I have a string of the following form

$string = "This is {test} for [a]{test2} for {test3}.";

I want to get all curly brackets that are not prefixed by square brackets. Thus, in the above string I would like to get {test} and {test3} but not [a]{test2}.

I found in the answer https://stackoverflow.com/a/977294/2311074 that this might be possible with negative lookahead. So I tried

  $regex      = '/(?:(?!\[[^\}]+\])\{[^\}]+\})/';
  echo preg_match_all($regex, $string, $matches) . '<br>';
  print_r($matches);

but this still gives me all three curly brackets.

3

Array ( [0] => Array ( [0] => {test} [1] => {test2} [2] => {test3} ) )

Why is this not working?

The reason your regex fails is that it matches any { (followed with 1+ non-}s and then a }) if it does not start a sequence of the patterns inside the negative lookahead, a [, 1+ chars other than } and then a ] (and it is always true, so, you get all {...} substrings as a result).

Use (*SKIP)(*FAIL) technique:

\[[^]]*]\{[^}]+}(*SKIP)(*F)|\{[^\}]+}

See the regex demo.

Details:

  • \[[^]]*]\{[^}]+}(*SKIP)(*F) - matches
    • \[ - a [
    • [^]]* - 0+ chars other than ]
    • ]\{ - ]{ substring
    • [^}]+ - 1+ chars other than ]
    • } - a literal }
    • (*SKIP)(*F) - PCRE verbs discarding the text matched so far and forcing the engine to go on looking for the next match from the current position (as if a match occurred)
  • | - or
  • \{[^\}]+}:
    • \{ - a {
    • [^\}]+ - 1+ chars other than } and
    • } - a literal }.

See the PHP demo:

$string = "This is {test} for [a]{test2} for {test3}.";
$regex      = '/\[[^]]*]\{[^}]+}(*SKIP)(*F)|\{[^}]+}/';
echo preg_match_all($regex, $string, $matches) . "
";
print_r($matches[0]);

Output:

2
Array
(
    [0] => {test}
    [1] => {test3}
)

If you are sure opening curly braces would only be preceded with a pair of square brackets (balanced) then a negative lookbehind will do the job:

(?<!]){[^}]*}

Live demo