I get the following Error:
Warning: file_get_contents(https://www.readability.com/api/content/v1/parser?url=http://www.redmondpie.com/ps1-and-ps2-games-will-be-playable-on-playstation-4-very-soon/?utm_source=dlvr.it&utm_medium=twitter&token=MYAPIKEY) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.1 404 NOT FOUND in /home/DIR/htdocs/readability.php on line 23
With some Echoes I got the URL parsed by the function and it is fine and valid, I do the request from my Browser and it is OK.
The thing is that I get the Error Above with file_get_contents and I really don't understand why.
The URL is Valid and the Function is NOT Blocked by the Free Hosting Service (So I don't need Curl).
If someone could spot the error in my Code, I would appreciate it! Thanks...
Here is my Code:
<?php
class jsonRes{
public $url;
public $author;
public $url;
public $image;
public $excerpt;
}
function getReadable($url){
$api_key='MYAPIKEY';
if(isset($url) && !empty($url)){
// I tried changing to http, no 'www' etc... -THE URL IS VALID/The browser opens it normally-
$requesturl='https://www.readability.com/api/content/v1/parser?url=' . urlencode($url) . '&token=' . $api_key;
$response = file_get_contents($requesturl); // * here the code FAILS! *
$g = json_decode($response);
$article_link=$g->url;
$article_author='';
if($g->author != null){
$article_author=$g->author;
}
$article_url=$g->url;
$article_image='';
if($g->lead_image_url != null){
$article_image=$g->lead_image_url;
}
$article_excerpt=$g->excerpt;
$toJSON=new jsonRes();
$toJSON->url=$article_link;
$toJSON->author=$article_author;
$toJSON->url=$article_url;
$toJSON->image=$article_image;
$toJSON->excerpt->$article_excerpt;
$retJSONf=json_encode($toJSON);
return $retJSONf;
}
}
?>
Sometimes a website will block crawlers(from remote servers) from getting to their pages.
What they do to work around this is spoof a browsers headers. Like pretend to be Mozilla Firefox instead of the sneaky PHP web scraper they are.
This is a function which uses the cURL library to do just that.
function get_data($url) {
$userAgent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13';
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html = curl_exec($ch);
if (!$html) {
echo "<br />cURL error number:" .curl_errno($ch);
echo "<br />cURL error:" . curl_error($ch);
exit;
}
else{
return $html;
}
//End of cURL function
}
One would then call it as below:
$response = get_data($requesturl);
Curl offers much more options in fetching of remote content and error checking than file_get_contents does. If you even want to customize it further, check out the list of cURL options here - Abridged list of cURL options