Example HTML:
<div class"classX">
<a href="#" class="aClass">Link Text 1</a>
<span class="sClass"><p>Text #1</p></span>
</div>
<div class="classX">
<a href="#" class="aClass">Link Text 2</a>
</div>
<div class="classX">
<a href="#" class="aClass">Link Text 3</a>
</div>
<div class="classX">
<a href="#" class="aClass">Link Text 4</a>
<span class="sClass"><p>Text #4</p></span>
</div>
<div class="classX">
<a href="#" class="aClass">Link Text 5</a>
<span class="sClass"><p>Text #5</p></span>
</div>
I'm trying to build an array that will look like:
[0] => Array
(
[link_text] => Link Text 1
[span_text] => Text #1
)
[1] => Array
(
[link_text] => Link Text 2
)
[2] => Array
(
[link_text] => Link Text 3
)
[3] => Array
(
[link_text] => Link Text 4
[span_text] => Text #4
)
[4] => Array
(
[link_text] => Link Text 5
[span_text] => Text #5
)
But using a foreach
loop with a $key
value organizes the output incorrectly and instead, I get an array that looks like this:
[0] => Array
(
[link_text] => Link Text 1
[span_text] => Text #1
)
[1] => Array
(
[link_text] => Link Text 2
[span_text] => Text #4
)
[2] => Array
(
[link_text] => Link Text 3
[span_text] => Text #5
)
[3] => Array
(
[link_text] => Link Text 4
)
[4] => Array
(
[link_text] => Link Text 5
)
I fully understand why this happens, that's because I'm using link_text
key when accessing the span_text
value but I have no idea how to properly build an array with a correct combination.
PHP:
$finder = new DomXPath($dom);
$link_texts= $finder->query("//a[contains(@class, normalize-space('aClass'))]");
$span_text= $finder->query("//span[contains(@class,'sClass')]/@data-html");
foreach ($link_texts as $key => $link_text) {
if (empty($span_text[$key]->textContent)) {
$link_text = trim($link_text->textContent);
$dataArr[] = str_replace("
", " ", $link_text);
$data[] = array("link_text"=>str_replace("
", " ", $link_text));
} else {
$span_text = str_replace("
", " ", $span_text[$key]->textContent);
$span_text = preg_replace('~</?p[^>]*>~', '', $span_text);
$link_text = trim($link_text->textContent);
$data[] = array("link_text"=>str_replace("
", " ", $link_text), "span_text"=>$span_text);
}
}
I think it would be easier to start by selecting all the parent <div class"classX">
elements. Then we can select the nested a
and span
elements for each div
.
$finder = new DomXPath($dom);
$divs = $finder->query("//div[@class='classX']");
$data = array();
foreach($divs as $div) {
$link = $finder->query("./a[@class='aClass']", $div)->item(0);
$span = $finder->query("./span[@class='sClass']", $div)->item(0);
$items = array(
"link_text" => $link ? $link->textContent : null,
"span_text" => $span ? $span->textContent : null
);
$data[] = array_filter($items);
}
print_r($data);
This produces a $data
array with all the link_text
and span_text
items in the correct order.
Null values are removed by array_filter
, so some nested arrays don't have a span_text
key.
If a constant number of items is required, then don't flter the $items
array.