用于在未转义的下划线之间匹配文本的正则表达式

I'm trying to replace text which is between underscores with the tag. This is what pattern I'm currently using (Link to online tester: TESTER):

[^\\]?_(([^_]*)[^\\])_

This is the result I want to get:

_test1_ _test2__test3_ \_test4\_ => <b>test1</b> <b>test2</b><b>test3</b> \_test4\_

Can anyone tell me whats wrong with my pattern?

You may use

(?<!\\)((?:\\{2})*)_([^_\\]*(?:\\.[^_\\]*)*)_

PHP declaration:

$pattern = '~(?<!\\\\)((?:\\\\{2})*)_([^_\\\\]*(?:\\\\.[^_\\\\]*)*)_~';

See the regex demo

Details:

  • (?<!\\)((?:\\{2})*)_ - matches an unescaped _: any number of double \ symbols (see (?:\\{2})*, 0+ sequences of two consecutive \ symbols) that are not preceded with a \ ((?<!\\) negative lookbehind performs this check)
  • ([^_\\]*(?:\\.[^_\\]*)*)_ - matches any number of symbols other than _ or any number of escaped symbols thus only matching up to the first unescaped _.
    • [^_\\]* - matches 0+ chars other than \ and _
    • (?:\\.[^_\\]*)* - 0+ sequences of:
      • \\. - any escaped char (if you use s DOTALL modifier, even a line break char)
      • [^_\\]* - 0+ chars other than \ and _

To use the same approach in JavaScript and other regex engines that do not support a lookbehind, use (^|[^\\]) group instead of (?<!\\):

(^|[^\\])((?:\\{2})*)_([^_\\]*(?:\\.[^_\\]*)*)_

And replace with $1$2<b>$3</b>. See this regex demo.