too long

I'm using an image scraping feature developed here: https://github.com/morshedalam/url-scraper-php

They are using this regular expression to find images:

private $img_expression = '/<img[^>]+src=([\'"])?((?(1).+?|[^\s>]+))(?(1)\1)/';

This is fine, however, it returns every single image (including tiny ones). Much like Pinterest, Facebook etc. I'm only interested in getting images that serve as thumbs i.e width > 200px. I realize that the dimensions of an image might not be defined in the html source.

How would you do this?

Cheers.

You need to download the extracted images, get their size and select those that are large enough.

Interestingly, there's an SO answer just for that: php get all the images from url which width and height >=200 more quicker