找到标签之间的一切

I'm trying to find everything between two tags, even when there is the same tag in the tag (kind of weird explained).

Well here is an example: this <tag id="1">is an <tag id="2">example</tag> for</tag> tags <tag id="3">in tags</tag>.

I'm able to get everything between the tags, but not the one which are paired like.

My regex looks like: <tag id="(.*?)">(.*?)</tag>

How can I tell my regex to look if there is <tag> and then ignore the next </tag>? And that an infinite times.

Because I'm working with php, is there maybe another solution which is better and faster?

The output should look like:

id => content
- 1 => is an <tag id="2">example</tag> for
- 2 => example
- 3 => in tags

You have an already structured string with a well know syntax, and PHP has build-in tools to parse this syntax. There's no reason to use a string approach with regex or with string functions.

In this example, I choose DOMDocument::loadHTML in place of DOMDocument::loadXML to have a more lenient parser, but if you have a well formatted XML document, this change isn't needed.

I assume that each <tag> node has an id attribute, but if it isn't always the case, you can simply test its presence using DOMNode::hasAttribute in the first foreach loop.

$html = 'this <tag id="1">is an <tag id="2">example</tag> for</tag> tags <tag id="3">in tags</tag>';

$dom = new DOMDocument;
$state = libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors($state);

$nodeList = $dom->getElementsByTagName('tag');

$results = [];

foreach ($nodeList as $node) {
    $content = '';
    foreach ($node->childNodes as $child) {
        $content .= $dom->saveHTML($child);
    }
    $results[$node->getAttribute('id')] = $content;
}

print_r($results);