I want to retrieve all links from www.gifgif.ir
the all of I need have /product and they are 360.but I only get 37.
My code is:
<?php
/**
* Created by PhpStorm.
* User: saleh
* Date: 10/16/17
* Time: 9:58 PM
*/
set_time_limit(-1);
header('Content-Type: text/html; charset=utf-8');
// example of how to use basic selector to retrieve HTML contents
include('/home/saleh/Downloads/simple_html_dom_1_5/simple_html_dom.php');
// get DOM from URL or file
$html = file_get_html('http://www.gifgif.ir/');
$c="";
// find all image with full tag
foreach($html->find('a') as $e){
if (isset($e->attr['href'])) {
if (strstr($e->attr['href'], "/product")) {
if ($c == $e->attr['href']) {
} else {
echo $e->attr['href'] . "
";
$c= $e->attr['href'];
}
}
}
}
?>
But it returns only 10 links. What should I do to get all the links to return?
http://www.gifgif.ir/product/pId-2HnAXuEJBRdsM4
http://www.gifgif.ir/product/pId-TeYhzl2oPwnIgr
http://www.gifgif.ir/product/pId-KoYUDejZa7Jc9m
http://www.gifgif.ir/product/pId-r1H0kayBexIcXF
http://www.gifgif.ir/product/pId-FaLdA5P4WqDyXi
http://www.gifgif.ir/product/pId-lYXV65Fw0NzB3e
http://www.gifgif.ir/product/pId-Gc1uxSp6tHFmhi
http://www.gifgif.ir/product/pId-Qe3TZltc2WEpvj
http://www.gifgif.ir/product/pId-ybZ2kPLewHojsd
http://www.gifgif.ir/product/pId-yJS8czqOMT7vjB
</div>
The content on the page is loaded dynamically - the first served page only contains a subset of the actual HTML. If you watch the Network tab under the developer tools in your browser of choice, you'll see that it loads contents (the getList
call) as you're scrolling.
You'll have to work around that by making requests to the actual endpoint that loads content, and not just read the initial content served by the page. Since I'm guessing the site content creator don't want their API to be exposed in public, I'm not going to write code that actually does this, but you should be able to create a loop that makes calls to getList
, parses the HTML and extracts the relevant data.