无法使用php自动下载所有链接作为单独的html文件

I'm trying to make a page in PHP that read the webpage source, find all the links and then for each single link (if is a html) download automatically the file on my pc (better without asking where...).

this is my code:

<?php

$srcUrl= 'http://www.justdogbreeds.com/all-dog-breeds.html';

$html = file_get_contents($srcUrl);

$dom = new DOMDocument();
@$dom->loadHTML($html);

// grab all the on the page
$xpath = new DOMXPath($dom);

//finding the a tag
$hrefs = $xpath->evaluate("/html/body//a");

$testo = '<table width="100%" border="1" cellspacing="2" cellpadding="2" summary="layout">
  <caption>
    List of links
  </caption>
  <tr>
    <th scope="col">&nbsp;</th>
        <th scope="col">&nbsp;</th>
  </tr>';

//Loop to display all the links and download
for ($i = 0; $i < $hrefs->length; $i++) {

       $href = $hrefs->item($i);
       $url = $href->getAttribute('href');

 //if real link
       if($url!='#')  

       {

 //Code to get the file...
 $data = file_get_contents($url);

 //save as?
 $filename = $url;

 /*save the file...
 $fh = fopen($filename,"w");
 fwrite($fh,$data);
 fclose($fh);*/

        $hfile = fopen($data ,"r");
        if($hfile){
            while(!feof($hfile)){
                $html=fgets($hfile,1024);
            }
        }
 $fh = fopen($filename,"w");
 fwrite($fh,$html);
 fclose($fh);

//download automatically (better if without asking where... maybe in download folder)
header('Content-disposition: attachment; filename=' . $filename);
header("Content-Type: application/force-download");
header('Content-type: text/html');

 //display link to the file you just saved...
    $testo.='<tr>
    <td>'.$url.'</td>
    <td></td>
    </tr>';
       }

}

$testo.='</table>';

echo $testo;

?>

what I do wrong ? thanks

You are messing between couple of things. This what your current code does:

  1. Load the content of the original URL
  2. Find links
  3. For each link:
    1. Download the content ($data = file_get_contents($url);) - this is good
    2. Open new file for read ($hfile = fopen($data ,"r");) - not sure why you need this, it actually does nothing because the name of the file you try to open is the content from 3.1, and you really don't need to read anything - you already have the content of the url.
    3. Write the content of the file you just read (the lines of $h = fopen ->fclose), however - you have some problem here, because the name of the file you are trying to create is a url (ie http://somedomain.sometld/somefile.html?t=1&r=2), and you cannot create a file with that name. You need to create a random filename.
    4. Send headers for the browser to download an HTML file, containing the name of the file you just saved.
      You got several problems here: First your headers are multiply by the number of links you find on that page, and you don't need it. You need to send these headers only one time. Second - you have the same problem with the name of the file.

I did a few changes in your code, and it should work:

<?php
$srcUrl= 'http://www.justdogbreeds.com/all-dog-breeds.html';

$html = file_get_contents($srcUrl);

$dom = new DOMDocument();
@$dom->loadHTML($html);

// grab all the on the page
$xpath = new DOMXPath($dom);

//finding the a tag
$hrefs = $xpath->evaluate("/html/body//a");

$testo = '<table width="100%" border="1" cellspacing="2" cellpadding="2" summary="layout">
  <caption>
    List of links
  </caption>
  <tr>
    <th scope="col">&nbsp;</th>
        <th scope="col">&nbsp;</th>
  </tr>';

$filename = 'list-of-links.html';
header('Content-disposition: attachment; filename=' . $filename);
header("Content-Type: application/force-download");
header('Content-type: text/html');

//Loop to display all the links and download
for ($i = 0; $i < $hrefs->length; $i++) {
    $href = $hrefs->item($i);
    $url = $href->getAttribute('href');
    //if real link
    if($url!='#') {
        //Code to get the file...
        $data = file_get_contents($url);

        //save as?
        $filename = mt_rand(10000000, 90000000) . ".html";
        file_put_contents($filename, $data);

        //display link to the file you just saved...
        $testo.='<tr>
        <td>'.$url.'</td>
        <td></td>
        </tr>';
    }
}
$testo.='</table>';
echo $testo;
?>

I would recommend adding a sleep of a few seconds after each request to make sure you don't put too much pressure on the server.