Is there an efficient way to replace all duplicate non-alphanumeric characters with single characters?
This question forces one to be explicit about the punctuation characters:
PHP - Removing Duplicate Punctuation?
Like so:
$str = preg_replace('~[?!]{2,}~', '?', preg_replace('~([.,!?])(\\1+)~', '\\1', $str));
Is it possible to achieve the same result but for ALL non-alphanumeric characters without explicitly referencing them by name?
Here's a use case:
Hello... how are you!!?? I''m bored!!----!!!&&&&&^^^^%%%(()))((<<<<<
to
Hello. how are you!? I'm bored!-!&^%()(<
UPDATE
Unfortunately the above cuts too deep in one use case: http://
. How can one keep double /
for urls (or simply when they follow :
), but not allow regular repeat /
or even more than 2 /
after a :
. Here is a single use case:
My ////favorite//// site is http://///example.com!!!!!!!
becomes:
My /favorite/ site is http://example.com!
You can use:
$str = preg_replace('~((?<!:)[^\p{L}\p{N}])\1+~u', '$1', $str);
//=> Hello. how are you!? I'm bored!-!&^%()(<
[^\p{L}\p{N}]
- Match anything but unicode alphanumeric character(?<!:)
- Match only if not precede by :
to take care of http://...
([^\p{L}\p{N}])
- Capture above in a group #1 for back-reference\1+
- Match one or more of captured group #1, thus making sure 2 or more of same non-alphanumeric is matched$1
i.e. the captured non-alphanumeric character