I'm trying to find everything between two tags, even when there is the same tag in the tag (kind of weird explained).
Well here is an example: this <tag id="1">is an <tag id="2">example</tag> for</tag> tags <tag id="3">in tags</tag>
.
I'm able to get everything between the tags, but not the one which are paired like.
My regex looks like: <tag id="(.*?)">(.*?)</tag>
How can I tell my regex to look if there is <tag>
and then ignore the next </tag>
? And that an infinite times.
Because I'm working with php, is there maybe another solution which is better and faster?
The output should look like:
id => content
- 1 => is an <tag id="2">example</tag> for
- 2 => example
- 3 => in tags
You have an already structured string with a well know syntax, and PHP has build-in tools to parse this syntax. There's no reason to use a string approach with regex or with string functions.
In this example, I choose DOMDocument::loadHTML
in place of DOMDocument::loadXML
to have a more lenient parser, but if you have a well formatted XML document, this change isn't needed.
I assume that each <tag>
node has an id
attribute, but if it isn't always the case, you can simply test its presence using DOMNode::hasAttribute
in the first foreach loop.
$html = 'this <tag id="1">is an <tag id="2">example</tag> for</tag> tags <tag id="3">in tags</tag>';
$dom = new DOMDocument;
$state = libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors($state);
$nodeList = $dom->getElementsByTagName('tag');
$results = [];
foreach ($nodeList as $node) {
$content = '';
foreach ($node->childNodes as $child) {
$content .= $dom->saveHTML($child);
}
$results[$node->getAttribute('id')] = $content;
}
print_r($results);