为什么这个DOMXpath查询合并兄弟节点值?

Given the following code:

$html = "<h1>foo</h1><h2>bar</h2>";
$document = new DOMDocument();
$document->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($document);
$h1Nodes = $xpath->query('//h1');
foreach ($h1Nodes as $h1Node) {
    var_dump($h1Node->nodeValue);
}

H1 tag contains only text node with the text 'foo'. Text 'bar' is in a sibling heading node (h2). I would expect the output to be 'foo'.

However, the output is 'foobar'.

Why?

Thank you, for your comment, hardik solanki.

It lead me to the answer: valid markup must have a root element.

Markup, which I've provided doesn't have one, and flags I've used prevent the library from adding one implicitly. So the first tag is treated as a root element and the result is a bit confusing.

Dropping those flags helps for this issue, but I am using them for a purpose. I just want to manipulate a snippet of HTML, and not a whole document. I want to get this snippet back (after transformations), by calling DOMDocument::saveHTML(). Without doctype/<html>/<body> tags.

I've ended up doing this:

  • I add doctype/<html>/<body> tags to the HTML snippet I want to manipluate to have temporary a valid document
  • load it with DOMDocument
  • transform it the way I need
  • save it with DOMDocument::saveHTML()
  • get rid of excess doctype/<html>/<body> tags markup

It works.