I need to scrape list of products with the price from a this site.
What do I need to add to scripe only this list of products ( http://www.tehnomanija.rs/lcd-i-led--televizori)
This is my code:
<?php
$curl = curl_init();
curl_setopt ($curl, CURLOPT_URL, "http://www.tehnomanija.rs/lcd-i-led--televizori");
curl_exec ($curl);
$result = curl_exec($curl);
curl_close ($curl);
//parser
preg_match("<td class=\"product_list_cell\">")siU, $result, $matches1);
$suscriptores = $matches1[1][0];
echo "Suscriptores: " . $suscriptores;
print $result;
?>
take a look at https://github.com/tj/php-selector
it's essentially a wrapper for DOMDocument
and DOMxpath
which allows you to use css selectors like so
$elements = select_elements('div#someId', $html);
You are wrong using regex for this task. Use xpath to retrive the needed dom nodes from the html. See an example.
I might also mention some of your mistakes:
So the code should be smth like this:
<?php
$curl = curl_init();
curl_setopt ($curl, CURLOPT_URL, "http://www.tehnomanija.rs/lcd-i-led--televizori");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($curl);
curl_close ($curl);
//parser
preg_match("/<td\s+class=\"product_list_cell\">(.*?)<\/td>/siU", $result, $matches);
print_r($matches[1]);
$suscriptores = $matches[1];
echo "Suscriptores: " . $suscriptores;
print $result;
Yet, you still can't properly fetch by regex since the inner structure mixes </td>
s of different levels. Your only way is xPath.