如何从html页面获取文本链接? [重复]

This question already has an answer here:

I want to get the links "http://www.w3schools.com/default.asp" & "http://www.google.com" from this webpage.I want the links of <a> tags inside <div class="link">,there are many other <a> tags in this page and I don't want them. How can I retrieve the particular links only? Can anyone help me?

<div class="link">
<a href="http://www.w3schools.com/default.asp">
<h4>W3 Schools</h4>
</a>
</div>
<div class="link">
<a href="http://www.google.com">
<h4>Google</h4>
</a>
</div>
</div>

Use a DOM Parser such as DOMDocument to achieve this:

$dom = new DOMDocument;
$dom->loadHTML($html); // $html is a string containing the HTML

foreach ($dom->getElementsByTagName('a') as $link) {
    echo $link->getAttribute('href').'<br/>';
}

Output:

http://www.w3schools.com/default.asp
http://www.google.com

Demo.


UPDATE: If you only want the links inside the specific <div>, you can use an XPath expression to find the links inside the div, and then loop through them to get the href attribute:

$dom = new DOMDocument;
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);
$links_inside_div = $xpath->query("//*[contains(@class, 'link')]/a");

foreach ($links_inside_div as $link) {
    echo $link->getAttribute('href').'<br/>';
}

Demo.

$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('a') as $node)
{
  echo $node->nodeValue.': '.$node->getAttribute("href")."
";
}

You can use snoopy PHP class . Snoopy is a PHP class that simulates a web browser. It automates the task of retrieving web page content and posting forms, http://sourceforge.net/projects/snoopy/

Otherwise try to using Jquery

 <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js">
 <script type="text/javascript">
    $( document ).ready(function() {
         $( ".link a" ).each(function( index ) {
             var link = $( this ).attr("href") );
             alert(link );
         });
    });
</script>

You can also get all links using this one also (javascript)

 var list = document.getElementsByTagName("a");