灵活的正则表达式来取出DOM的一部分

First, I know about Simple HTML Dom parser and PHP's built-in solution, which none of them are doing exactly that kind of job I'm asking for (not to my knowledge).

I'm looking for PHP's PCRE that will find the element and the belonging content inside DOM, delete it and forgive if markup contains any extra whitespace.

Here is code:

<div id="maindiv">
    <div class="unusefuldiv1">Unuseful content</div>
    <div id="unusefuldiv2">Unuseful content2</div>
    <!--  ... some content I'm after for -->
</div>

I'm desperate about regular expression pattern that will delete both .uunusefuldiv1 and #unusefuldiv2 (markup together with content) and be (if possible) enough flexible to do the job if, for example <div class="unusefuldiv1"> is slightly mistyped with extra empty space: <div class="unusefuldiv1" > .

That might be something similar to

preg_replace('/<div\b[^>]*>(.*?)<\/div>/is', '', $dom_content);

except that this pattern will delete all div's, be them with of some classes, id's or without.

Does anyone have solution?

$dom_content = preg_replace( 
    '/\s*<div [^<>]*unuseful[^<>]+>.*?<\/div\s*>\s*/is', '', $dom_content );

will remove divs (and surrounding whitespace) whose opening tag contains the word unuseful.

For a better regex solution you will need to better describe the criteria for deleting a div.

There is no reason to not use a deidcated DOM parser here:

$dom = new DOMDocument();
$dom->loadHTML($data);

$xpath = new DOMXPath($dom);

$id = 'unusefuldiv2';
$classname = 'unusefuldiv1';
$query = "//div[@id='$id']|//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]";
foreach ($xpath->query($query) as $node) {    
    $node->parentNode->removeChild($node);
}

echo $dom->saveHTML();

Demo: http://eval.in/11108