preg_match_all到下一个注释标签HTML包含。评论

I try to get all text to to the next occurrence of the comment tag and the text between the brackets from the comment tag. At the moment i only get the comment tag text between the brackets but not the content to the next comment its only returns a empty string "" I'm kind of confused. Thanks!

header("Content-Type:text/plain");
$tmp= file_get_contents("filter.html");
preg_match_all('@<!--\[(.*?)\]-->(.*?)@su', $tmp, $found, PREG_SET_ORDER);
var_dump($found);

filter.html

<!--[%TEST%]-->
TEST
TEST
<!--[%DAS%]-->
DAS TEST
123456
<!--[%BKK%]-->
ABCDEFG
YXZ

The output i get is:

array(3) {
  [0]=>
  array(3) {
    [0]=>
    string(15) "<!--[%TEST%]-->"
    [1]=>
    string(6) "%TEST%"
    [2]=>
    string(0) ""
  }
  [1]=>
  array(3) {
    [0]=>
    string(14) "<!--[%DAS%]-->"
    [1]=>
    string(5) "%DAS%"
    [2]=>
    string(0) ""
  }
  [2]=>
  array(3) {
    [0]=>
    string(14) "<!--[%BKK%]-->"
    [1]=>
    string(5) "%BKK%"
    [2]=>
    string(0) ""
  }
}

Solution: change the regex into...

@<!--\[(.*?)\]-->(.*?)(?=<!--|$)@su

Codepad Viper Demo.

Explanation: the original regex almost correctly used .*? expression to get all the non-comments part. I said 'correctly', because the laziness modifier is indeed required here (otherwise the .* combo will happily consume the whole string). And I said 'almost', because the modifier is too lazy in this particular case - even an empty string is enough to satisfy it (as '' does match /.*/). That's why you get those empty strings in the $found - the victims of laziness taken to the extreme, they were...

So what we need is to make this part of the regex a bit more 'eager' - persuade it to keep devouring the string until it...

either encounters the beginning of the new comment ('
or arrives at the end of the string.

And that's exactly expressed by this lookahead pattern:

(?=<!--|$)

It reads as 'match ONLY at the position that's either followed by a new comment, or is actually the end of the string'. And that's how it whips this lazy .*? sub-expression into a helpful movement - no longer it's able to stop wherever it alone wants to.

preg_match_all到下一个注释标签HTML包含。 评论

preg_match_all到下一个注释标签HTML包含。评论