I'm looking help to resolve a problem I face with the preg_replace php function. I made a Regex to match acronym, abreviation, ... Some of them uses dash or dot to separate letters, some don't.
\p{Lu}+(\p{Zs}?[.-]\p{Zs}?){1,10}
My purpose is to replace dash and dots with # , and I'm trying to use :
$re = '/\p{Lu}+(\p{Zs}?[.-]\p{Zs}?){1,10}/i';
$str ='normal text C.G. P- U.T.O .K.L. another normal text';
$subst = '${1}#';
$result = preg_replace($re, $subst, $str);
In my understanding this should replace the first capturing group (a dash or a dot) and replace it by a #. But in fact, it replaces the letter.
For instance, in this string C.G. P- U.T.O .K.L. I expect to have CGPUTOKL, but in fact I have .#. #- #.#.# .#.#..
You can access to all this on : https://regex101.com/r/gkeGiw/4.
Could you tell me where I'm wrong (and why) ?
Thank you in advance,
Regards,
Charles
You need preg_replace_callback
. As @SebastianProske said, you were capturing the bit you don't want. However, you can't just capture the bit you do want, inside a repeating pattern, because the last match overwrites all previous ones, so you'd only get the last letter of each one. You should match the entire acronym, and then scrub the match. This assumes a minimum of 2 letters per abbreviation:
$text_abbreviation_normalised = preg_replace_callback(
'/\p{Lu}(?:(?:\p{Zs}*[.-]\p{Zs}*)?\p{Lu}){1,9}(?:\p{Zs}*\.)?/',
function($matches) {
return preg_replace('/\P{Lu}+/', '', $matches[0]);
},
$text
);
https://regex101.com/r/gkeGiw/7 for the explanation.
It's technically possible to do that without a callback but the regex would be hideous.