在div之间替换HTML实体,仅在div之间替换

Given the following string:

asd &nbsp; <div> def &nbsp; foo &nbsp; </div> ghi &nbsp; <div> moo &nbsp; </div>

I want to remove all of the &nbsp;'s that are within <div>s, resulting in:

asd &nbsp; <div> def  foo  </div> ghi &nbsp; <div> moo  </div>

I can use any standard PHP stuff, but I'm not sure how to approach the problem. I couldn't figure out how to keep the contents inside the <div>s while removing the &nbsp;

The reason why I need this is because WordPress's content filter adds &nbsp; under strange situations. I can't simply remove all &nbsp; because they might've been specifically entered by the user, but I need to remove all of them within the element that's having display problems caused by them

      $text = "asd &nbsp; <div> def &nbsp; </div> ghi &nbsp; <div> moo &nbsp; </div>";
      echo preg_replace_callback(
                "#<div(.*?)>(.*?&nbsp;.*?)</div>#i",
                "filter_nbsp",
                $text);

                function filter_nbsp($matches)
    {

      return "<div".$matches[1].">" . str_replace("&nbsp;","",$matches[2]) . "</div>";
    }

That should work for entities between div elements closed as </div>,

output

asd &nbsp; <div> def  </div> ghi &nbsp; <div> moo  </div> 

The following works in your case:

$str = "asd &nbsp; <div> def &nbsp; </div> ghi &nbsp; <div> moo &nbsp; </div>";
$res = preg_replace("%<div>(.*?)&nbsp;(.*?)</div>%", "<div>$1$2</div>", $str);

But beware of some facts:

  • It won't work if the divs have attributes;
  • It won't work as expected if the divs are nested;
  • It applies the replacement of a &nbsp; only one time, so multiple &nbsp;s inside divs are untouched.

So the abovementioned replacement is not a good solution at all. It's way better to first find the div tags with a (XML) parser function and then replace all &nbsp;s.

simple_html_dom

    $html = str_get_html('asd &nbsp; <div> def &nbsp; </div> ghi &nbsp; <div> moo &nbsp; </div>');

foreach($html->find('div') as $element) {
      $a = $element->plaintext;
      $element->innertext = preg_replace('{\&nbsp;}','',$a);
}

echo $html;