如何用PHP中的单个标点符号替换所有重复的标点符号?

Is there an efficient way to replace all duplicate non-alphanumeric characters with single characters?

This question forces one to be explicit about the punctuation characters:

PHP - Removing Duplicate Punctuation?

Like so:

$str = preg_replace('~[?!]{2,}~', '?', preg_replace('~([.,!?])(\\1+)~', '\\1', $str));

Is it possible to achieve the same result but for ALL non-alphanumeric characters without explicitly referencing them by name?

Here's a use case:

Hello...  how   are you!!??  I''m bored!!----!!!&&&&&^^^^%%%(()))((<<<<<

to

Hello. how are you!? I'm bored!-!&^%()(<

UPDATE

Unfortunately the above cuts too deep in one use case: http://. How can one keep double / for urls (or simply when they follow :), but not allow regular repeat / or even more than 2 / after a :. Here is a single use case:

My ////favorite//// site is http://///example.com!!!!!!!

becomes:

My /favorite/ site is http://example.com!

You can use:

$str = preg_replace('~((?<!:)[^\p{L}\p{N}])\1+~u', '$1', $str);
//=> Hello. how are you!? I'm bored!-!&^%()(<

RegEx Demo

  • [^\p{L}\p{N}] - Match anything but unicode alphanumeric character
  • (?<!:) - Match only if not precede by : to take care of http://...
  • ([^\p{L}\p{N}]) - Capture above in a group #1 for back-reference
  • \1+ - Match one or more of captured group #1, thus making sure 2 or more of same non-alphanumeric is matched
  • Replace it by $1 i.e. the captured non-alphanumeric character