如何匹配所有单词,但正则表达式在字符串中“停止”

another regex question. I use PHP, and have a string: fdjkaljfdlstopfjdslafdj. You see there is a stop in the middle. I just want to replace any other words excluding that stop. i try to use [^stop], but it also includes the s at the end of the string.



My Solution

Thanks everyone’s help here.

I also figure out a solution with pure RegEx method(I mean in my knowledge scoop to RegEx. PCRE verbs are too advanced for me). But it needs 2 steps. I don’t want to mix PHP method in, because sometimes the jobs are out of coding area, i.e. multi-renaming filenames in Total Commander.

Let’s see the string: xxxfooeoropwfoo,skfhlk;afoofsjre,jhgfs,vnhufoolsjunegpq. For example, I want to keep all foos in this string, and replace any other non-foo greedily into ---.

First, I need to find all the non-foo between each foo: (?<=foo).+?(?=foo). The string will turn into xxxfoo---foo---foo---foolsjunegpq, just both sides non-foo words left now.

Then use [^-]+(?=foo)|(?<=foo)[^-]+. This time: ---foo---foo---foo---foo---. All words but foo have been turned into ---.

[^stop] doesn't means any text that is NOT stop. It just means any character that is not one of the 4 characters inside [...] which is in this case s,t,o,p.

Better to split on the text you don't want to match:

$s = 'fdjkaljfdlstopfjdslafdjstopfoobar';

php> $arr = preg_split('/stop/', $s);

php> print_r($arr);
Array
(
    [0] => fdjkaljfdl
    [1] => fjdslafdj
    [2] => foobar
)

i just dont want to include "stop"...

You can skip it by using PCRE verbs (*SKIP)(*F) try like this

stop(*SKIP)(*F)|.

Demo at regex101

or sequence: (stop)(*SKIP)(*F)|(?:(?!(?1)).)+

or for words: stop(*SKIP)(*F)|\w+

You can generalize this to any pattern:

(?<neg>stop)(*SKIP)(*FAIL)|(?s:.)+?(?=\Z|(?&neg))

Demo

Just put the pattern you don't want in the neg group.

This regex will try to do the following for any character position:

  • Match the pattern you don't want. If it matches, discard it with (*SKIP)(*FAIL) and restart another match at this position.
  • If the pattern you don't want doesn't match at a particular position, then match anything, until either:
    • You reach the end of the input string (\Z)
    • Or the pattern you don't want immediately follows the current matching position ((?&neg))

This approach is slower than manually tuning the expression, you could get better performance at the cost of repeating yourself, which avoids the recursion:

stop(*SKIP)(*FAIL)|(?s:.)+?(?=\Z|stop)

But of course, the best approach would be to use the features provided by your language: match the string you don't want, then use code to discard it and keep everything else.

In PHP, you can use the PREG_OFFSET_CAPTURE flag to tell the preg_match_all function to provide you the offsets of each match.