I'm trying to load an HTML page by using a URL. This is what I'm doing now to find the count of images on a page:
$html = "http://stackoverflow.com/";
$doc = new DOMDocument();
@$doc->loadHTML($html);
$tags = $doc->getElementsByTagName('*');
$count = 0;
foreach ($tags as $tag) {
if (strcmp($tag->tagName, "img") == 0) {
$count++;
}
}
echo $count;
I know this isn't an efficient way to do this, I just set it up as an example. Each time, count is 0. But there are images on the page. Which brings me to believe the page isn't loading right. What am I doing wrong? Thanks.
From the docs
DOMDocument::loadHTML — Load HTML from a string
It's signature is quite clear about this, too:
public bool DOMDocument::loadHTML ( string $source [, int $options = 0 ] )
You could try using DOMDocument::loadHTMLFile
, or simply get the markup of the given url using file_get_contents
or a cURL request (whichever works best for you).
And please don't use the error-suppression operator @
of death if something emits a notice/warning/error, there's a problem. Don't ignore it, fix it!
Tag names in HTML are canonically in upper-case, however you can avoid the issue by using strcasecmp
instead of strcmp
.
Or avoid both problems by doing it properly:
$count = $doc->getElementsByTagName('img')->length;