I'm scraping a website for work so I can't give the url but when I curl the page I get 400 response. The same is true when I go to the page on chrome.
$ curl -I <url>
HTTP/2 400 content-type: text/plain; charset=utf-8 accept-ranges: bytes accept-ranges: bytes via: 1.1 varnish age: 0 accept-ranges: bytes accept-ranges: bytes date: Tue, 25 Sep 2018 19:34:37 GMT via: 1.1 varnish x-served-by: cache-mdw17368-MDW, cache-bos8235-BOS x-cache: MISS, MISS x-cache-hits: 0, 0 x-timer: S1537904078.900892,VS0,VE33
When I curl the same url with guzzleHttp it returns a 200 response code.
try {
$client->request('GET', $url, [
'allow_redirects' => [
'track_redirects' => true
]
]);
} catch (GuzzleException $e) {
return false;
}
This is making it really hard for me to decipher which pages are relevant and which aren't is there a option that needs to be set?