替换utf8字符串中的所有非单词字符[重复]

This question already has an answer here:

how can i replace all non word characters (utf-8) in a string ?

for ASCII:

$url = preg_replace("/\W+/", " ", $url);

is there any equivalent for UTF-8 ?

</div>

You can use the Xwd character class that contains letters, digits and underscore:

$url = preg_replace('~\P{Xwd}+~u', ' ', $url);

If you don't want the underscore, you can use Xan

\p{Xwd} (Perl word character) is a predefined character class and \P{Xwd} is the negation of this class.

The u modifier means that the string must be treated as an unicode string.

equivalence:

\p{Xan}        <=>     [\p{L}\p{N}]
\p{Xwd}        <=>     [\p{Xan}_]

Use unicode properties:

$url = preg_replace("/[^\p{L}\p{N}_]+/u", " ", $url);

\p{L} stands for any letter
\p{N} stands for any number.