This question already has an answer here:
I want to get the links "http://www.w3schools.com/default.asp" & "http://www.google.com" from this webpage.I want the links of <a>
tags inside <div class="link">
,there are many other <a>
tags in this page and I don't want them. How can I retrieve the particular links only? Can anyone help me?
<div class="link">
<a href="http://www.w3schools.com/default.asp">
<h4>W3 Schools</h4>
</a>
</div>
<div class="link">
<a href="http://www.google.com">
<h4>Google</h4>
</a>
</div>
</div>
Use a DOM Parser such as DOMDocument to achieve this:
$dom = new DOMDocument;
$dom->loadHTML($html); // $html is a string containing the HTML
foreach ($dom->getElementsByTagName('a') as $link) {
echo $link->getAttribute('href').'<br/>';
}
Output:
http://www.w3schools.com/default.asp
http://www.google.com
UPDATE: If you only want the links inside the specific <div>
, you can use an XPath expression to find the links inside the div, and then loop through them to get the href
attribute:
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$links_inside_div = $xpath->query("//*[contains(@class, 'link')]/a");
foreach ($links_inside_div as $link) {
echo $link->getAttribute('href').'<br/>';
}
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('a') as $node)
{
echo $node->nodeValue.': '.$node->getAttribute("href")."
";
}
You can use snoopy PHP class . Snoopy is a PHP class that simulates a web browser. It automates the task of retrieving web page content and posting forms, http://sourceforge.net/projects/snoopy/
Otherwise try to using Jquery
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js">
<script type="text/javascript">
$( document ).ready(function() {
$( ".link a" ).each(function( index ) {
var link = $( this ).attr("href") );
alert(link );
});
});
</script>
You can also get all links using this one also (javascript)
var list = document.getElementsByTagName("a");