A question similar to others asked here before, but as I cannot figure out how to apply these suggestions, I'd need some help.
I'd like to find the nodes of an html-document which has a structure like this (extracts, can vary):
<h2>My title 1</h2>
<h3>Sub-heading</h3>
<p>...<span><a href='#'>...</a></span></p>
<div>...</div>
<h2>My title 2</h2>
<p>No sub-heading here :O</p>
<h3>But here</h3>
<p>No link</p>
<h2>And so on...</h2>
<p>...</p>
What I'd like to accomplish is to find all nodes from one h2 until the last item before the next h2, including the h2 itself. As in my example I'd like to retreive "blocks" like these:
Block 1:
<h2>My title 1</h2>
<h3>Sub-heading</h3>
<p>...<span><a href='#'>...</a></span></p>
<div>...</div>
Block 2:
<h2>My title 2</h2>
<p>No sub-heading here :O</p>
<h3>But here</h3>
<p>No link</p>
Block 3:
<h2>And so on...</h2>
<p>...</p>
I have no whatsoever more to aim for (no id, no text content I could know about, no for-sure content, etc), apart from the h2's.
You can use DOMXpath and query method.
First find all the h2 elements from the body (not nested h2 elements)
Then start a foreach
loop for every h2 found. Then add that h2 to an array $set
because you want to save it. Then loop the siblings and add those to the array $set
up to the next h2 that you find.
Add $set
to $sets
array.
For example:
$html = <<<HTML
<h2>My title 1</h2>
<h3>Sub-heading</h3>
<p>...<span><a href='#'>...</a></span></p>
<div>...</div>
<h2>My title 2</h2>
<p>No sub-heading here :O</p>
<h3>But here</h3>
<p>No link</p>
<h2>And so on...</h2>
<p>...</p>
<div><h2>This is nested</h2></div>
HTML;
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
$domNodeList = $xpath->query('/html/body/h2');
$sets = array();
foreach($domNodeList as $element) {
// Save the h2
$set = array($element);
// Loop the siblings unit the next h2
while ($element = $element->nextSibling) {
if ($element->nodeName === "h2") {
break;
}
// if Node is a DOMElement
if ($element->nodeType === 1) {
$set[] = $element;
}
}
$sets[] = $set;
}
The $sets will now contain 3 arrays which will contain your added DOMElements.