Im talking about performing a deep recursion for around 5+ mins, something that you may have a crawler perform. in order to extract url links and and sub-url links of pages
it seems that deep recursion in PHP does not seem realistic
e.g.
getInfo("www.example.com");
function getInfo($link){
$content = file_get_content($link)
if($con = $content->find('.subCategories',0)){
echo "go deeper<br>";
getInfo($con->find('a',0)->href);
}
else{
echo "reached deepest<br>";
}
}
Doing something like this with recursion is actually a bad idea in any language. You cannot know how deep that crawler will go so it might lead to a Stack Overflow. And if not it still wastes a bunch of memory for the huge stack since PHP has no tail-calls (not keeping any stack information unless necessary).
Push the found URLs into a "to crawl" queue which is checked iteratively:
$queue = array('www.example.com');
$done = array();
while($queue) {
$link = array_shift($queue);
$done[] = $link;
$content = file_get_contents($link);
if($con = $content->find('.subCategories', 0)) {
$sublink = $con->find('a', 0)->href;
if(!in_array($sublink, $done) && !in_array($sublink, $queue)) {
$queue[] = $sublink;
}
}
}