防止PHP preg_replace替换shortcode标记内的文本

In WordPress we can use shortcode function inside a post content using this format:

[shortcode]data[/shortcode]

for example:

[shortcode]Lorem ipsum[/shortcode] dolor sit amet, consectetur adipiscing elit. Praesent laoreet fermentum lacinia. Ut molestie purus interdum lacus pharetra placerat.

My question is, what is the regular expression that we need to replace any text inside the post content Except the one inside the shortcode?

s/.*\[shortcode]([^[]*)\[\/shortcode].*/\1/gm

Question doesn't say what text to be replaced and by what. Consider this code to do search/replacement of some pattern outside shortcode tag.

Objective: To italicize each occurrence of foo but not inside shortcode tag.

$str = "foo
 bar[shortcode]foo[/shortcode]foo 123
 bar [shortcode]foo[/shortcode] foo"; // post content
$query = "foo"; // search pattern

$query = preg_quote($query, '~');
$p = '~((\[shortcode\])(?(2).*?\[/shortcode\])(?:.*?))*?(' . $query . ')~smi';
$s = preg_replace($p, "$1<i>$3</i>", $str);
var_dump($s);

OUTPUT

String(100) "<i>foo</i>
 bar[shortcode]foo[/shortcode]<i>foo</i> 123
 bar [shortcode]foo[/shortcode] <i>foo</i>"

As you can see above output encloses search text foo to italicized text ONLY outside the shortcode tag.

The following regex will match everything that is in-between shortcode tags, and nothing else:

  • (?<=\[shortcode]).*?(?=\[/shortcode])

It looks for positions right after [shortcode], then matches any string -- as short as possible -- that ends right before [/shortcode].

If you want to print all these inbetween-strings in PHP, you need code like this:

preg_match_all('%(?<=\[shortcode\]).*?(?=\[/shortcode\])%s',
                               $input, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
    echo $result[0][$i];
}

If you want to use the caught text outside the loop, then replace the echo $result[0][$i]; line with:

  • $output .= $result[0][$i];