用于选择性剥离HTML的正则表达式

I'm trying to parse some HTML with PHP as an exercise, outputting it as just text, and I've hit a snag. I'd like to remove any tags that are hidden with style="display: none;" - bearing in mind that the tag may contain other attributes and style properties.

The code I have so far is this:

$page = preg_replace("#<([a-z]+).*?style=\".*?display:\s*none[^>]*>.*?</\1>#s","",$page);`

The code it returning NULL with a PREG_BACKTRACK_LIMIT_ERROR.
I tried this instead:

$page = preg_replace("#<([a-z]+)[^>]*?style=\"[^\"]*?display:\s*none[^>]*>.*?</\1>#s","",$page);

But now it's just not replacing any tags.

Any help would be much appreciated. Thanks!

Using DOMDocument, you can try something like this:

$doc = new DOMDocument;
$doc->loadHTMLFile("foo.html");
$nodeList = $doc->getElementsByTagName('*');
foreach($nodeList as $node) {
    if(strpos(strtolower($node->getAttribute('style')), 'display: none') !== false) {
        $doc->removeChild($node);
    }
}
$doc->saveHTMLFile("foo.html");

You should never parse HTML with Regex. That makes your eyes bleed. HTML is not regular in any form. It should be parsed by using a DOM-parser.

Parse HTML to DOM with PHP