I need to fetch data about a product from a given url, i.e. images, product title, price, etc.. I'm currently fetching all of the images of the webpage using simple PHP file_get_contents code, so that's working great. I'm wondering what's the best practice for fetching the other data though. I need to be able to fetch data from Etsy, Zappos, ASOS, Net-a-Porter, Nordstrom and PopSugar. Do I need a bot? Is it even possible? Thank you very much in advance!
You can use file_get_contents()
to obtain the html for the page, but after that you will need to read the DOM to find the elements you want to read information from (src's from images, hrefs from anchors etc)..
There are actually several ways to do what you want, and without more information it is rather hard to give you a specific answer, but you can start with something like:
$html = file_get_contents('your url');
$Dom = new DOMDocument();
$Dom->loadHTML($html);
At this point you got a DomDocument (http://www.php.net/manual/en/class.domdocument.php) object loaded with all the information of your page.
You can then select elements with ie. Xpath.
An example:
$XPath = new DOMXPath($Dom);
$Anchors = $XPath->query('//a');
for ($i = 0; $i < $Anchors->length; $i++) {
$Anchor = $Anchors->item($i);
echo 'Href #' . $i . ': ' . $Anchor->getAttribute('href') . '<br />';
}
The code above will print all the anchor hrefs on the page and is just a basic example which is powerfull enough to do whatever you might want. You still will need to dive into the usage of DomDocument and XPath to learn how to get exactly what you want, but that shoulnt be to hard from this point on.