解析外部HTML并返回图像

I'm building a site that depends on bookmarklets. These bookmarklets pull the URL and a couple of other elements. However, I need to select 1 image from the page the user bookmarks. Currently I'm trying to use the PHP Simple HTML DOM Parser http://simplehtmldom.sourceforge.net/

It pulls the HTML as expected, and returns the tags as expected. However, I want to take this a step further and only return images with a min width of 40px. I know about the function getimagesize() but from what I understand, this is resource heavy. Is there a better method available to pre-process the image and achieve the results I'm looking for?

Thanks!

First check if the image HTML tag has a width attribute. If it's above 40, skip over it. As Matthew mentioned, it will get false positives where people sized down a large image to 40px wide, but that's no big deal; the point of this step is to quickly weed out the first dozen or so images that are obviously too big.

Once the script catches an image that SAYS it's under 40px wide, check the header information to deduce a general width based on the size of the file. This is faster than getimagesize because you don't have to download the image to get the info.

function get_image_kb($path) {
    $headers = get_headers($path);
    $len = explode(" ",$headers[6]);
    return $len[1];
}


$imageKb = get_image_kb('test1.jpg');
// I'm going to gander 40x80 is about 2000kb
$cutoffSize = 2000;
if ($imageKb < $cutoffSize) {
    // this is the one!
}
else {
    // it was a phoney, keep scraping
}

Setting it at 2000kb will also let through images that are 100x30, which isn't good.

However, at this point, you've weeded out most of the huge 800kb files that would really slow you down, and because we know it's under 2kb, it's not too taxing to test this one with getimagesize() to get an accurate width.

You can tweak the process depending on how picky you are for the 40px mark, as usual higher accuracy takes more time, and vice versa.