preg_replace仅删除结束标记

I'm working on a joomla site that use JotCache as cache component. To exclude from cache some modules directly on template files, this component use own "markers" such as:

<jot myposition s> Module Position <jot myposition e>

Now, i'm trying to minify html trough php using DOMDocument but the result is this and the cache component doesn't work:

<jot myposition s> Module Position <jot myposition e></jot></jot>

I'm thinking to use preg_replace to strip the </jot> closing tag. I tried this regex "/<[\/]*jot[^>]*>/i" but it strips all <jot> tags, including the required <jot myposition s> and <jot myposition e>.

Since I'm not sure how to accomplish this with DOMDocument (prevent tags closing automatically), how can I do this with preg_replace?

Any ideas would be very appreciated.

Thanks.

A Nice Chance to Explore some Regex Features!

With all the disclaimers about using regex to work with xml-type documents... There are several interesting options for such a task.

Option 1: Plain but Reliable

$replaced = preg_replace('%(<jot.*?</jot>)</jot>%', '$1', $yourstring);
  • Here, for safety, we match your whole string including the two </jot> at the end.
  • The .*? "lazy dot-star" quantifier ensures we don't accidentally run past the first closing </jot>
  • The parentheses capture the string you want to Group 1
  • We replace with Group 1

Option 2: More "Cheeky"

$replaced = preg_replace('%</jot></jot>%', '</jot>', $yourstring);
  • Here, we just match </jot></jot>
  • We replace with </jot>

Option 3: Futuristic

$replaced = preg_replace('%</jot>(?=</jot>)%', '', $yourstring);
  • Here, we match </jot>, then the lookahead (?=</jot>) asserts that </jot> can be found again, but doesn't match it.
  • We replace with an empty string

Option 4: Keep Out!

$replaced = preg_replace('%<jot.*?</jot>\K</jot>%', '', $yourstring);
  • As in the first option, <jot.*?</jot> matches a whole tag...
  • Then \K tells the engine to drop whatever has been matched so far!
  • and </jot> matches the second </jot>
  • which we replace with the empty string

The below regex would capture all the characters after </ symbol and in the replacement part, it replaces the matched characters with empty string.

<\/.*$

Explanation:

  • < Matches a literal < symbol.
  • \/ Matches forward slash /
  • .*$ Matches all the characters upto the last.

DEMO

Your php code would be,

<?php
$re = '~<\/.*$~'; 
$str= '<jot myposition s> Module Position <jot myposition e></jot></jot>';
$replacement = "";
echo preg_replace($re, "", $str);
?>  //=> <jot myposition s> Module Position <jot myposition e>

If you're just going to strip </jot>, why don't you use a simpler approach by using str_replace?

$output = '<jot myposition s> Module Position <jot myposition e></jot></jot>';
$output = str_replace('</jot>', '', $output);

From the documentation:

If you don't need fancy replacing rules (like regular expressions), you should always use this function instead of preg_replace().