I'm working on a joomla site that use JotCache as cache component. To exclude from cache some modules directly on template files, this component use own "markers" such as:
<jot myposition s> Module Position <jot myposition e>
Now, i'm trying to minify html trough php using DOMDocument but the result is this and the cache component doesn't work:
<jot myposition s> Module Position <jot myposition e></jot></jot>
I'm thinking to use preg_replace to strip the </jot>
closing tag. I tried this regex "/<[\/]*jot[^>]*>/i"
but it strips all <jot>
tags, including the required <jot myposition s>
and <jot myposition e>
.
Since I'm not sure how to accomplish this with DOMDocument (prevent tags closing automatically), how can I do this with preg_replace?
Any ideas would be very appreciated.
Thanks.
A Nice Chance to Explore some Regex Features!
With all the disclaimers about using regex to work with xml-type documents... There are several interesting options for such a task.
Option 1: Plain but Reliable
$replaced = preg_replace('%(<jot.*?</jot>)</jot>%', '$1', $yourstring);
</jot>
at the end..*?
"lazy dot-star" quantifier ensures we don't accidentally run past the first closing </jot>
Option 2: More "Cheeky"
$replaced = preg_replace('%</jot></jot>%', '</jot>', $yourstring);
</jot></jot>
</jot>
Option 3: Futuristic
$replaced = preg_replace('%</jot>(?=</jot>)%', '', $yourstring);
</jot>
, then the lookahead (?=</jot>)
asserts that </jot>
can be found again, but doesn't match it.Option 4: Keep Out!
$replaced = preg_replace('%<jot.*?</jot>\K</jot>%', '', $yourstring);
<jot.*?</jot>
matches a whole tag...\K
tells the engine to drop whatever has been matched so far!</jot>
matches the second </jot>
The below regex would capture all the characters after </
symbol and in the replacement part, it replaces the matched characters with empty string.
<\/.*$
Explanation:
<
Matches a literal <
symbol.\/
Matches forward slash /
.*$
Matches all the characters upto the last.Your php code would be,
<?php
$re = '~<\/.*$~';
$str= '<jot myposition s> Module Position <jot myposition e></jot></jot>';
$replacement = "";
echo preg_replace($re, "", $str);
?> //=> <jot myposition s> Module Position <jot myposition e>
If you're just going to strip </jot>
, why don't you use a simpler approach by using str_replace
?
$output = '<jot myposition s> Module Position <jot myposition e></jot></jot>';
$output = str_replace('</jot>', '', $output);
From the documentation:
If you don't need fancy replacing rules (like regular expressions), you should always use this function instead of preg_replace().