I've written a PHP script to get a website (http://primaire.recitus.qc.ca/sujets/13/personnages-marquants/3972) source code using curl. At first, everything seemed to be working like a charm, but after some time, I started to stumble upon some websites which would not be fetched. The error returned often was some thing like
Failed connect to primaire.recitus.qc.ca:80; Connection timed out
But the website itself loads quite fast. I upped the timeout setting to as high as a minute, but it didn't work. So I figured the problem was not with my script. I then checked on this website to see if they could fetch it, but they couldn't either. Here is the script I used :
<?php
header("Access-Control-Allow-Origin: *");
header('Content-type: text/plain');
$input = "http://primaire.recitus.qc.ca/sujets/13/personnages-marquants/3972";
$method = $_SERVER['REQUEST_METHOD'];
$ch = curl_init($input);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15');
curl_setopt($ch, CURLOPT_REFERER, 'http://www.google.com');
curl_setopt($ch, CURLOPT_ENCODING, 'gzip,deflate');
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4 );
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPGET, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 5);
curl_setopt($ch, CURLOPT_DNS_USE_GLOBAL_CACHE, false );
curl_setopt($ch, CURLOPT_DNS_CACHE_TIMEOUT, 2);
if(curl_exec($ch) === false) {
echo 'Curl error: ' . curl_error($ch);
}
$output = curl_exec($ch);
curl_close($ch);
echo $output;
?>
I don't know if I'm missing something, but it seems to me that I've covered most of the exceptions with setopt, but tell me if I'm wrong. Any help appreciated.