如何使用php从图像链接中提取'href'和'src'?

have a code like this:

<a href='www.link_not_required.com'>
<a href='www.link_not_required.com'>
<a href='www.link_1.com'><img src='image_1.png'></a> 
<a href='www.link_2.com'><img src='image_2.png'></a> 
<a href='www.link_3.com'><img src='image_3.png'></a> 
<a href='www.link_4.com'><img src='image_4.png'></a> 
<img src='image_not_required.png'>
<img src='image_not_required.png'>

I want to extract hrefs of only anchors which contain images and also src of those images ? I don't want links of anchors which do not contain images and also srcs of images which are not anchors.

How do I do this ? Can it be done using Simplehtmldom library?

I'm not sure why would you want to access contents of a HTML page using PHP which is a server side language. You could easily do this using JavaScript or jQuery.

However, lets say you read the contents of the HTML file/URL using some method (some of them can be file_get_contents, cURL, readfile etc.), and wish to use SimpleHTMLDom library. You could do below

  1. find all the images in the page and loop through them
  2. find the parent element of the selected item from above

Step #1 will give you all img tags, while step #2 will give you the corresponding parent anchor tags. You should be able to extract the required attributes.

All of this is available at http://simplehtmldom.sourceforge.net/manual.htm and I don't think Googling/reading through manual is that difficult.

It looks something like this:

require_once('simple_html_dom.php');
$str = <<<EOF
<a href='www.link_not_required.com'>
<a href='www.link_not_required.com'>
<a href='www.link_1.com'><img src='image_1.png'></a> 
<a href='www.link_2.com'><img src='image_2.png'></a> 
<a href='www.link_3.com'><img src='image_3.png'></a> 
<a href='www.link_4.com'><img src='image_4.png'></a> 
<img src='image_not_required.png'>
<img src='image_not_required.png'>
EOF;

$html = str_get_html($str);
foreach($html->find('a') as $a){
  echo $a->href . ':' . $a->find('img',0)->src . "
";
}

Note that some a tags are not closed so the results will be mangled.