This question already has an answer here:
I have this html code:
<html>
<div class="the_grp">
<h3>heading <span id="sn-sin" class="the_decs">(keyword: <i>cat</i>)</span></h3>
<ul>
<li>
<div>
<div><span class="w_pos"></span></div>
<div class="w_the">
<a href="http://www.exampledomain.com/20111/cute-cat">cute cat</a>,
<a href="http://www.exampledomain.com/7456/catty">catty</a>,
</div>
</div>
</li>
<li>
<div>
<div><span class="w_pos"></span></div>
<div class="w_the">
<a href="http://www.exampledomain.com/7589/sweet">sweet</a>,
<a href="http://www.exampledomain.com/10852/sweet-cat">sweet cat</a>,
<a href="http://www.exampledomain.com/20114/cat-vs-dog">cat vs dog</a>,
</div>
</li>
</ul>
</div>
<a id="ant"></a>
<div class="the_grp">
<h3>another heading <span id="sn-an" class="the_decs">(ignore this: <i>cat</i>)</span></h3>
<ul>
<li>
<div>
<div><span class="w_pos"></span></div>
<div class="w_the"><a href="http://www.exampledomain.com/118/bad-cat">bad cat</a></div>
</div>
</li>
</ul>
</div>
i want to match the following words from the html code:
i'm using this pattern and capturing [2] to get those words:
#<a href="http\:(.*?)">(.*?)<\/a>#i
my php code looked like this:
preg_match_all('#<a href="http\:(.*?)">(.*?)<\/a>#i', $data, $matches);
echo '<pre>';
print_r($matches[2]);
echo '</pre>';
That pattern match "bad cat" too. How to capture only this following words: cute cat, catty, sweet, sweet cat, cat vs dog?
Thanks in advance.
</div>
It would be best just to use an HTML parser. Here's how you do it by using http://simplehtmldom.sourceforge.net/.
file_get_html
would be preferably, it will go basically call file_get_contents and str_get_html
,
str_get_html
is how you can parse string to an simple html dom object.
<?php
require('simple_html_dom.php');
$html = str_get_html(/*your html here*/);
foreach($html->find('a') as $element)
echo $element->plaintext . '<br>';
?>
And if you don't want bad cat to match, simply loop through the results and remove/ignore it that way.
And if you want to remove bad cat
:
foreach($html->find('a') as $element)
if ($element->plaintext != "bad cat")
echo $element->plaintext . '<br>';