From HTML page: https://www.topazlabs.com/downloads I want extract Topaz ReMask version number for Windows as string: v5.0.1
I download HTML with curl
I use query:
like this ;
->finder->query("//div[contains(@class, 'wpb_wrapper')]/.//a[text()[contains(.,'Topaz ReMask')]]/../../../div");
OR
...->finder->query("//div[contains(@class, 'wpb_wrapper')]//a[text()[contains(.,'Topaz ReMask')]]/../../../div");
Then I look for all DIV tags to search the one with this two strings "/" and "(Win)", something like this: $versionString = Find($nodes, "/", "(Win)");
I process text to extract only Windows version.
It works, but can it be simplified?
The HTML part of the page I work with is this:
...
<div class="wpb_wrapper">
<div class="vc_empty_space" style="height: 20px">
<span class="vc_empty_space_inner">
</span>
</div>
<div id="mpc_textblock-975b2251c2a82c7" class="mpc-textblock mpc-init mpc-typography--preset_2 ">
<p>
<a href="/remask" target="blank">Topaz ReMask</a>
</p>
</div>
<div class="mpc-tooltip-wrap" data-id="mpc_textblock-615b2251c2a8c4a">
<div id="mpc_textblock-615b2251c2a8c4a" class="mpc-textblock mpc-init mpc-typography--preset_0 ">
<p>
<em>v5.0.3 (Mac) / v5.0.1 (Win)
</em>
</p>
</div>
<div id="mpc_tooltip-925b2251c2a8d2f" class="mpc-tooltip mpc-init mpc-typography--preset_4 mpc-position--left mpc-can-hover mpc-trigger--hover ">Mac Updated November 4, 2016
<br>Windows Updated November 21, 2016
<div class="mpc-arrow">
</div>
</div>
</div>
<div id="mpc_textblock-475b2251c2a9601" class="mpc-textblock mpc-init ">
<p>The quickest and easiest way to mask your photo.
</p>
</div>
</div>
...
Well you could base it on the text content only. Using DOMXpath::evaluate()
you can fetch the string directly:
$document= new DOMDocument();
$document->loadHTML($html);
$xpath = new DOMXpath($document);
$expression = "substring-after(
//div[contains(.//p, 'Topaz ReMask')]//text()[starts-with(., 'Windows Updated ')],
'Windows Updated '
)";
var_dump($xpath->evaluate($expression));
Output:
string(24) "November 21, 2016
"
Xpath expressiondiv
that has a p
with the text Topaz ReMask
, ...//div[contains(.//p, 'Topaz ReMask')]
Windows Updated
...//div[contains(.//p, 'Topaz ReMask')]//text()[starts-with(., 'Windows Updated ')]
Windows Updated
: substring-after(
//div[contains(.//p, 'Topaz ReMask')]//text()[starts-with(., 'Windows Updated ')],
'Windows Updated '
)