I try to get all text to to the next occurrence of the comment tag and the text between the brackets from the comment tag. At the moment i only get the comment tag text between the brackets but not the content to the next comment its only returns a empty string "" I'm kind of confused. Thanks!
header("Content-Type:text/plain");
$tmp= file_get_contents("filter.html");
preg_match_all('@<!--\[(.*?)\]-->(.*?)@su', $tmp, $found, PREG_SET_ORDER);
var_dump($found);
filter.html
<!--[%TEST%]-->
TEST
TEST
<!--[%DAS%]-->
DAS TEST
123456
<!--[%BKK%]-->
ABCDEFG
YXZ
The output i get is:
array(3) {
[0]=>
array(3) {
[0]=>
string(15) "<!--[%TEST%]-->"
[1]=>
string(6) "%TEST%"
[2]=>
string(0) ""
}
[1]=>
array(3) {
[0]=>
string(14) "<!--[%DAS%]-->"
[1]=>
string(5) "%DAS%"
[2]=>
string(0) ""
}
[2]=>
array(3) {
[0]=>
string(14) "<!--[%BKK%]-->"
[1]=>
string(5) "%BKK%"
[2]=>
string(0) ""
}
}
Solution: change the regex into...
@<!--\[(.*?)\]-->(.*?)(?=<!--|$)@su
Explanation: the original regex almost correctly used .*?
expression to get all the non-comments part. I said 'correctly', because the laziness modifier is indeed required here (otherwise the .*
combo will happily consume the whole string). And I said 'almost', because the modifier is too lazy in this particular case - even an empty string is enough to satisfy it (as ''
does match /.*/
). That's why you get those empty strings in the $found
- the victims of laziness taken to the extreme, they were...
So what we need is to make this part of the regex a bit more 'eager' - persuade it to keep devouring the string until it...
And that's exactly expressed by this lookahead pattern:
(?=<!--|$)
It reads as 'match ONLY at the position that's either followed by a new comment, or is actually the end of the string'. And that's how it whips this lazy .*?
sub-expression into a helpful movement - no longer it's able to stop wherever it alone wants to.