Possible Duplicate:
cURL Mult Simultaneous Requests (domain check)
I'm trying to check to see if a website exists. (if it responds that's good enough) The issue is my array of domains is 20,000 and I'm trying to speed up the process as much as possible.
I've done some research and come across this page which details simultaneous cURL requests -> http://www.phpied.com/simultaneuos-http-requests-in-php-with-curl/
I also found this page which seems be a good way of checking if a domain webpage is up -> http://www.wrichards.com/blog/2009/05/php-check-if-a-url-exists-with-curl/
Any ideas on how to quickly check 20,000 domains to see if they are up?
check out RollingCurl
It allows you to execute multiple curl requests. Here is an example:
require 'curl/RollingCurl.php';
require 'curl/RollingCurlGroup.php';
$rc = new RollingCurl('handle_response');
$rc->window_size = 2;
foreach($domain_array as $domain => $value)
{
$request = new RollingCurlRequest($value);
// echo $temp . "
";
$rc->add($request);
}
$rc->execute();
function handle_response($response, $info)
{
if($info['http_code'] === 200)
{
// site exists handle response data
}
}
$http = curl_init($url);
$result = curl_exec($http);
$http_status = curl_getinfo($http, CURLINFO_HTTP_CODE);
curl_close($http);
if($http_status == 200) // good here
YOu can use multi curl requests, but you probably want to limit them to 10 at a time or so. You would have to track jobs in a separate database for processing the queue: Threads in PHP
I think that if you really want to speed up the process and save a lot of bandwidth (as I got you plan to check the availability on a regular basis) then you should work with sockets, not with curl. You may open several sockets at time and arrange 'asynchronous' treatment of each socket. Then you need to send not the "GET $sitename/ HTTP/1.0 " request but "HEAD $sitename/ HTTP/1.0 ". It will return the same status code as GET request would return but without response body. You need to parse only first row of response to get an answer, so you just could regex_match it with good response codes. And as one extra optimization, eventually your code will learn what sites are sitting on the same IPs, so you cache the name mappings and order the list by IP. Then you may check several sites over one connected socket for these sites (remember to add 'Connection: keep-alive' header).