I need to convert the string This <span style="font-size: 16px;" style="color: red;">is</span> a test.
to This <span style="font-size: 16px; color: red;">is</span> a test.
There's also the possibility that there could be more than two matches or that there could be a style
, then a class
, then another style
, and the style
s would need to be combined. And they won't always be span
s
Unfortunately Tidy isn't an option as it is more over-bearing in it's cleaning than this project can accommodate.
Going the DOM document route won't work since multiple style attributes isn't valid, so it only gets the contents of the first one.
I'd like to do it with preg_replace, but getting just the matches from one tag is proving to be quite difficult.
If it makes things easier, they start life as nested tags. I have a preg_replace that combines them from there and gives this output.
I agree with the comments above that the best solution is to prevent this situation in the first place, but to answer your question: This function will combine all of the style attributes in the given string. Just make sure to pass only a single tag at a time. It doesn't matter how many other attributes are in the tag, nor does the order matter. It will combine all of the style attributes into the first style value, then remove all other style attributes:
/**
* @param string $str
* @return string
*/
function combineStyles($str)
{
$found = preg_match_all("/style=\"([^\"]+)\"/", $str, $matches);
if ($found)
{
$combined = 'style="' . implode(';', $matches[1]) . '"';
$patterns = $matches[0];
$replace = array_pad(array($combined), count($matches[0]), '');
$str = str_replace($patterns, $replace, $str);
}
return $str;
}
Wait, I've just realized it won't work with style="" id="" style=""
.
<?php
$str = 'This <span style="font-size: 16px" style="color: red;">is</span> a test. This <span style="font-size: 16px;" style="color: red;">is</span> a test.';
while (preg_match('/"\s+style="/', $str, $matches))
{
$pos = strpos($str, $matches[0]);
$prev = substr($str, 0, $pos);
if (substr(trim($prev), -1) != ";")
$prev .= ";";
$str = $prev.substr($str, $pos+strlen($matches[0]));
}
?>
Using .Net Regular Expressions within Visual Studio 2012's Quick Replace, this expression worked for me:
Find:
style\s*=\s*(?<q2>['"])(?<w1>(?:(?!\k<q2>).)*?);?\k<q2>\s*(?<c>[^<>]*)\s*style\s*=\s*(?<q2>['"])(?<w2>(?:(?!\k<q2>).)*?);?\k<q2>
Replace:
style="${w1};${w2};" ${c}
Notes: 1. This will only merge two style
attributes at a time. If there are more than that within a single tag, multiple runs will be required. 2. Any content between the two style attributes will be placed after the first style attribute (which is where the merged style attribute will be placed)
Explanation
Find:
style # match a style attribute
\s* # match any optional white space
= # match equals sign
\* # match any optional white space
(?<q2>['"]) # match either a single or double quote and stored in named capture 'q'
(?<w1> # start capture of first style attribute's content
(?: # start non-capturing match
(?!\k<q2>) # negative look-ahead to prevent matching on this attribute's quote
.)*? # end non-capturing match with minimal, 0-many quantifier
) # end capture of first style attribute's content
;? # place trailing semi-colon (if present) outside the capture
\k<q2> # match closing quote
\s* # match white space
(?<c>[^<>]*) # capture content between style attributes
\s* # match white space
... # repeat the above for a second style attribute
# except that the second style's capture is named 'w2'
Replacement:
style=" # start merged style attribute
${w1}; # place first style attribute's content
${w2}; # place second style attribute's content
" # finish merge style attribute
${c} # restore any content found between the two style attributes