I'm trying to get content of this page: http://www.nytimes.com/2014/01/26/us/politics/rand-pauls-mixed-inheritance.html?hp&_r=0
I tried file_get_contents
and curl
solution but all gives me a Login page of NYTimes and I have no idea why.
Tried these file_get_contents()/curl getting unexpected page, PHP file_get_contents() behaves differently to browser, file_get_content get the wrong web
Is there any solution? Thanks
EDIT:
//this is the curl code I use
$cookieJar = dirname(__FILE__) . '/cookie.txt';
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieJar);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookieJar);
curl_setopt($ch, CURLOPT_URL, $link);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12');
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
try to test it using saving cookies to same directory where the script resides first
so set the cookies path like that
$cookie = "cookie.txt";
this code works with me and i got the page
<?php
function curl_get_contents($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$get_page = curl_get_contents("http://www.nytimes.com/2014/01/26/us/politics/rand-pauls-mixed-inheritance.html?hp&_r=1");
echo $get_page;
?>
I think you need cURL to allow cookies to be saved. Try adding these lines to the cURL setup. For me this worked:
$cookie = dirname(__FILE__) . "\cookie.txt";
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
Use Live HTTP Headers firefox plugin to check what is going on during page access. There can be redirections, cookie set etc. And then try to implement this behaviour with php curl (note: set user-agent as and other client headers the same as browser)