This question already has an answer here:
Browsers consider an <option>
selected by default if it has the selected="selected"
attributes. But this somehow works even if that attribute value is omitted.
So
<option selected="selected" value="1">value text</option>
and this works
<option selected value="1">value text</option>
My question is how to write a Regex pattern that matches both conditions above, but never matches something like
<option value="the devil with **selected** ">value text</option>
EDIT: I forgot to mention that some conditions are still considered valid XHTML, like selected='selected', or selected=selected or even selected=SelEctEd
</div>
After discussions here, and some other resources like "RegEx match open tags except XHTML self-contained tags" I realized it's impractical to use Regular expressions to accurately parse XHTML.
With PCRE (which PHP uses) this works:
<option.*?\s(?:selected(?:=\"selected\")?)\s.*?>
# look for <option literally
# followed by anything (non greedy) and a whitespace(!)
# open a non capturing group and look for selected, eventually followed by ="selected"
# close the group, followed by a whitespace
# followed by anything (non-greedy) and the closing tag
See a regex 101 demo here. Besides, read the comments, there a good hints (using DomDocument, etc.) in there.