I'm trying to handle with php scraping using cURL and Simple Html Dom Parser, but i'm getting stuck while return json format. Website is a free webscraper test website..
function getPage($href) {
$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, $href);
curl_setopt($curl, CURLOPT_REFERER, $href);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$str = curl_exec($curl);
$html = str_get_html($str);
curl_close($curl);
return $html;
}
$link = 'https://www.webscraper.io/test-sites/e-commerce/allinone/computers';
$data = getPage($link);
foreach ($data->find('div[class=col-sm-4 col-lg-4 col-md-4]') as $key => $finder) {
$img = $finder->find('img[class=img-responsive]');
$imgCrt = $img->src;
$price = $finder->find('h4[class=pull-right price]');
$priceCrt = $price->innertext;
$desc = $finder->find('p[class=description]');
$descCrt = $desc->innertext;
$json['status'] = 'ok';
$json['return'][] = [
'img' => $imgCrt,
'price' => $priceCrt,
'desc' => $descCrt
];
}
echo json_encode($json);
Result:
{"status":"ok","return":[{"img":null,"price":null,"desc":null},{"img":null,"price":null,"desc":null},{"img":null,"price":null,"desc":null}]}
And errors...
Line 43, 45, 47:
43 - $imgCrt = $img->src;
45 - $priceCrt = $price->innertext;
47 - $descCrt = $desc->innertext;
Whitout those lines my result page become blank, with no erros and no json results.. Thanks in advance!
SOLUTION!!
While dumping discovered this:
var_dump($finder->find('img')[0]->src);
echo "<br />";
var_dump($finder->find('h4.price')[0]->innertext);
echo "<br />";
var_dump($finder->find('p.description')[0]->innertext);
Now works like a cham in:
$img[$key] = $finder->find('img')[0]->src;
$price[$key] = $finder->find('h4.price')[0]->innertext;
$desc[$key] = $finder->find('p.description')[0]->innertext;
$json['return'][] = [
'img' => $img[$key],
'price' => $price[$key],
'desc' => $desc[$key]
];
Result: img: https://i.imgur.com/it9ZxEC.png
Thanks!
Is $imgCrt = $img->src;
an object or array?
try $imgCrt = $img['src'];
If you are using PHP 7, once you have confirmed what type of scalar or vector your variable is, you could do something like this:
$imgCrt = $img['src'] ?? $img->src;
Translated, this is:
$imgCrt = is_array($img) && !empty($img['src']) ? $img['src'] : $img->src;
This assumes that your key is src
in your $img
variable.
Please see my comments for how to debug and see what values and value types.
Also remember to set a HTTP response code -> http://php.net/manual/en/function.http-response-code.php
Your aren't finding any elements in your ->find
call, that is why you're getting those errors.
The Simple html parser uses CSS selectors in the find method, the attribute you're searching for has spaces in it therefore it must be quoted.
Also find
returns an array unless you specify an index
foreach ($data->find('div["class=col-sm-4 col-lg-4 col-md-4"]') as $key => $finder) {
$img = $finder->find('img[class=img-responsive]', 0);
$imgCrt = $img->src;
$price = $finder->find('h4[class="pull-right price"]', 0);
$priceCrt = $price->innertext;
$desc = $finder->find('p[class=description]', 0);
$descCrt = $desc->innertext;
$json['status'] = 'ok';
$json['return'][] = [
'img' => $imgCrt,
'price' => $priceCrt,
'desc' => $descCrt
];
}
check if find()
returning data inside the foreach loop
var_dump($key);
var_dump($finder);
or,
print_r($finder);
print_r($key);