So I am processing up to about 3000 links, I am using curl_multi to speed up the process. I used a simple process of 20 at a time, wait for all 20 to finish, process 20 more type of method, but I know this is inefficient, especially if one of those 20 links takes forever to download. So I need to know how to write a loop that goes through all 3000 links by adding/removing handles as soon as I get the contents from that url.
I am using a few of these fundamentals:
define('RUNATONCE', 20); // Links to process at a time
// My URL holding multi-dimensional array:
// This first dimension is about 1000 and the second dimension is 3
$allurls[0][0];
I need to be able to:
1) Check when a handle is done, and to know to which url in my multidimensional array that handle belongs
2) Retrieve the contents of that handle and assign a process based on whether that handle's contents is part of $allurls[0][0],$allurls[0][1]
, or $allurls[0][2]
(different process for each of those)
3) Remove that handle and add another URL from $allurls
until all links have been processed
4) Process a manual time out on any URL that has been taking more than a certain amount of time, say 2 minutes (because CURLOPT_CONNECTTIMEOUT
& CURLOPT_TIMEOUT
do not work properly in a curl_multi environment (or at least that is my experience and understanding based on http://curl.haxx.se/mail/curlphp-2008-06/0006.html )), but I also need to know in my $allurls
if that URL timed out...
I know this seems like a bit of work, but for someone who knows this, it shouldn't be that much work... I just don't really know the specifics of how to do it... Thanks.
I had a similar situation where I needed to validate certain URLs and the two solutions I found were first to make PHP fork a new process using pcntl if it is installed or (and this is ugly, but unfortunately what I settled for since pcntl isn't installed on the server) use AJAX to request the PHP page that validates the URL. I have the timeout set to 30 seconds, so even if something is taking a long time, it doesn't matter.