仅输出PHP数组括号中不包含HTML标记的值

Here is a sample PHP array that explains my question well

$array = array('1' => 'Cookie Monster (<i>eats cookies</i>)',
               '2' => 'Tiger (eats meat)',
               '3' => 'Muzzy (eats <u>clocks</u>)',
               '4' => 'Cow (eats grass)');

All I need is to return only values that don't contain any tag enclosed with parentheses from this array:

- Tiger (eats meat)
- Cow (eats grass)

For this I'm going to use the following code:

$array_no_tags = preg_grep("/[A-Za-z]\s\(^((?!<(.*?)(\h*).*?>(.*?)<\/\1>).)*$\)/", $array);
foreach ($array_no_tags as $a_n_t) {echo "- ".$a_n_t."<br />";}

Assuming that [A-Za-z] may be whoever, \s is a space, \( is the opening parenthesis, ^((?! is start of the tag denial statement, <(.*?)(\h*).*?>(.*?)<\/\1> is the tag itself, ).)*$ is end of the tag denial statement and \) is the closing parenthesis.

Nothing works.

print_r($array_no_tags); returns empty array.

You could use the following expression to match strings with HTML tags inside of parentheses:

/\([^)]*<(\w+)>[^<>]*<\/\\1>[^)]*\)/

Then set the PREG_GREP_INVERT flag to true in order to only return items that don't match.

$array_no_tags = preg_grep("/\([^)]*<(\w+)>[^<>]*<\/\\1>[^)]*\)/", $array, true);

Explanation:

  • \( - Match the literal ( character
    • [^)]* - Negated character class to match zero or more non-) characters
    • <(\w+)> - Capturing group one that matches the opening element's tag name
    • [^<>]* - Negated character class to match zero or more non-<> characters
    • <\/\1> - Back reference to capturing group one to match the closing tag
    • [^)]* - Negated character class to match zero or more non-) characters
  • \) - Match the literal ) character

If you don't care about the parentheses around the element tag, then you could also just use the following simplified expression:

/<(\w+)>[^<>]+<\/\\1>/

And likewise, you would use:

$array_no_tags = preg_grep("/<(\w+)>[^<>]+<\/\\1>/", $array, true);

You pattern looks a bit overcomplicated. I thought maybe a simple pattern inside the negative lookahead that checks for not any <x inside ( ) could be sufficient.

$array_no_tags = preg_grep("/^(?!.*?\([^)<]*<\w)/", $array);

PHP demo at eval.in

So this does not match (?! if there is an ( opening bracket, followed by [^)<]* any amount of characters that are not ) or <, followed by <\w lesser sign that's followed by a word character.

Bear in mind that there are nice regex tools like regex101 available for testing patterns.