I'm trying to make a page in PHP that read the webpage source, find all the links and then for each single link (if is a html) download automatically the file on my pc (better without asking where...).
this is my code:
<?php
$srcUrl= 'http://www.justdogbreeds.com/all-dog-breeds.html';
$html = file_get_contents($srcUrl);
$dom = new DOMDocument();
@$dom->loadHTML($html);
// grab all the on the page
$xpath = new DOMXPath($dom);
//finding the a tag
$hrefs = $xpath->evaluate("/html/body//a");
$testo = '<table width="100%" border="1" cellspacing="2" cellpadding="2" summary="layout">
<caption>
List of links
</caption>
<tr>
<th scope="col"> </th>
<th scope="col"> </th>
</tr>';
//Loop to display all the links and download
for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
//if real link
if($url!='#')
{
//Code to get the file...
$data = file_get_contents($url);
//save as?
$filename = $url;
/*save the file...
$fh = fopen($filename,"w");
fwrite($fh,$data);
fclose($fh);*/
$hfile = fopen($data ,"r");
if($hfile){
while(!feof($hfile)){
$html=fgets($hfile,1024);
}
}
$fh = fopen($filename,"w");
fwrite($fh,$html);
fclose($fh);
//download automatically (better if without asking where... maybe in download folder)
header('Content-disposition: attachment; filename=' . $filename);
header("Content-Type: application/force-download");
header('Content-type: text/html');
//display link to the file you just saved...
$testo.='<tr>
<td>'.$url.'</td>
<td></td>
</tr>';
}
}
$testo.='</table>';
echo $testo;
?>
what I do wrong ? thanks
You are messing between couple of things. This what your current code does:
$data = file_get_contents($url);
) - this is good$hfile = fopen($data ,"r");
) - not sure why you need this, it actually does nothing because the name of the file you try to open is the content from 3.1, and you really don't need to read anything - you already have the content of the url.$h = fopen
->fclose
), however - you have some problem here, because the name of the file you are trying to create is a url (ie http://somedomain.sometld/somefile.html?t=1&r=2), and you cannot create a file with that name. You need to create a random filename.I did a few changes in your code, and it should work:
<?php
$srcUrl= 'http://www.justdogbreeds.com/all-dog-breeds.html';
$html = file_get_contents($srcUrl);
$dom = new DOMDocument();
@$dom->loadHTML($html);
// grab all the on the page
$xpath = new DOMXPath($dom);
//finding the a tag
$hrefs = $xpath->evaluate("/html/body//a");
$testo = '<table width="100%" border="1" cellspacing="2" cellpadding="2" summary="layout">
<caption>
List of links
</caption>
<tr>
<th scope="col"> </th>
<th scope="col"> </th>
</tr>';
$filename = 'list-of-links.html';
header('Content-disposition: attachment; filename=' . $filename);
header("Content-Type: application/force-download");
header('Content-type: text/html');
//Loop to display all the links and download
for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
//if real link
if($url!='#') {
//Code to get the file...
$data = file_get_contents($url);
//save as?
$filename = mt_rand(10000000, 90000000) . ".html";
file_put_contents($filename, $data);
//display link to the file you just saved...
$testo.='<tr>
<td>'.$url.'</td>
<td></td>
</tr>';
}
}
$testo.='</table>';
echo $testo;
?>
I would recommend adding a sleep of a few seconds after each request to make sure you don't put too much pressure on the server.