I'm using: http://simplehtmldom.sourceforge.net/ and noticed that in the examples, and trying to scrape certain sites, only some of them return results.
I'm using:
include_once('../../simple_html_dom.php');
// Create DOM from URL or file
$website = 'http://www.digg.com/';
$html = file_get_html($website);
// Find all images
foreach($html->find('img') as $element)
echo "<img src=\"" . $website . $element->src . "\"" . '<br>';
Which shows a bunch of thumbnails, but they are pretty much blank (and it's not returning all thumbnails).
Is it because they have some sort of htaccess restrictions on people? This happens for multiple websites.
You're assuming that $element->src is always relative to $website which it could easily not be...
For example: $element->src could already be http://www.digg.com/image.jpg so then doing $website . $element->src would be http://www.digg.com/http://www.digg.com/image.jpg and that wouldn't work...
Try
include_once('../../simple_html_dom.php');
// Create DOM from URL or file
$website = 'http://www.digg.com/';
$html = file_get_html($website);
// Find all images
foreach($html->find('img') as $element) {
//dont want double slashes
$src = ltrim($element->src, '/');
//dont want double urls
$src = str_replace($website, "", $src);
echo "<img src=\"" . $website . $src . "\"" . '<br>';
}