正则表达式多次替换包围的单词

I'm trying to replace all " " in twig syntax when it's surrounded by "{%" and "%}" or "{{" and "}}".

For example in the following string :

<p>{{ myFunction()&nbsp; }}</p>    
<p>&nbsp;</p>    
<p>{{ number|number_format(2, "&nbsp;.&nbsp;", '&nbsp;,&nbsp;')&nbsp;}}</p>    
<p>{% set myVariable = '&nbsp;&nbsp;' %}</p>

I want to replace every "&nbsp;" by "" except the "<p>&nbsp;</p>" one.

I'm doing the following :

$content = preg_replace('/({[{%].*)(&nbsp;)(.*[}%]})/', '$1 $3', $content);

but it replace only one occurrence of "&nbsp" in each brackets surroundings.

How to make it for all?

I'm trying to replace all &nbsp; in twig syntax when it's surrounded by {% and %} or {{ and }}.

If you are seeking for the easiest solution, just match all substrings that start with {{ and end with }}, or that start with {% and end with %} with '~{{.*?}}|{%.*?%}~s' regex, and use the pattern with the preg_replace_callback where you can further manipulate the match value inside the anonymous function:

preg_replace_callback('~{{.*?}}|{%.*?%}~s', function ($m) {
     return str_replace('&nbsp;', '', $m[0]); 
}, $s);

See the PHP demo

Pattern details:

  • {{.*?}} - match {{, then any 0+ characters as few as possible (due to the lazy *? quantifier) up to the closest }}
  • | - or
  • {%.*?%} - match {%, then any 0+ characters as few as possible up to the closest %}
  • ~s' - enables the DOTALL modifier so that.` could also match newline symbols.

\G is your friend here:

(?:(?:\{{2}|\{%)           # the start 
|
\G(?!\A))                  # or the beginning of the prev match
(?:(?!(?:\}{2}|%\})).)*?\K # do not overrun the closing parentheses
&nbsp;                     # match a &nbsp;

See a demo on regex101.com.


In PHP:
<?php

$string = <<<DATA
<p>{{ myFunction()&nbsp; }}</p>    
<p>&nbsp;</p>    
<p>{{ number|number_format(2, "&nbsp;.&nbsp;", '&nbsp;,&nbsp;')&nbsp;}}</p>    
<p>{% set myVariable = '&nbsp;&nbsp;' %}</p>
DATA;

$regex = '~
            (?:(?:\{{2}|\{%)
            |
            \G(?!\A))
            (?:(?!(?:\}{2}|%\})).)*?\K
            &nbsp;
          ~x';
$string = preg_replace($regex, ' ', $string);

?>

A full code example can be found on ideone.com.

Regex:

&nbsp;(?=(?:(?!{[{%]).)*[%}]})

Explanation:

&nbsp;              # Match non-breaking spaces (HTML entity)
(?=                 # Start of positive lookahead
    (?:                 # Start of non-capturing group (a)
        (?!{[{%])           # Asserts that next 2 characters are not {{ or {% (negative lookahead)
    .)*                 # Match any other characters (greedy) (except new-lines) (end of (a))
    [%}]}               # Up to a }} or %}
)                   # End of positive lookahead

In simple words it means all &nbsp;s that are finally followed by %} or }} and asserts they are within a {{...}} or {%...%} block.

Note

If you have ending delimiters not in the same line like below:

<p>{{ myFunction()&nbsp;

}}</p>    
<p>&nbsp;</p>    
<p>{{ number|number_format(2, "&nbsp;.&nbsp;", '&nbsp;,&nbsp;')&nbsp;

}}</p>    
<p>{% set myVariable = '&nbsp;&nbsp;'
%}</p>

Then you will need s modifier on by appending (?s) to regex:

(?s)&nbsp;(?=(?:(?!{[{%]).)*[%}]})

You may use it by default as well.

Live demo

PHP:

preg_replace('/&nbsp;(?=(?:(?!{[{%]).)*[%}]})/', ' ', $input);

Live demo