I've searched for a while so hopefully this is not a question that is asked many times already.
I'm trying to program on php a script that would remove stop words from a string, and then explode it in an array of words. The stop words could be in English or French.
Currently the following is not working for me as it doesn't remove French characters:
$needles=array(
'/\bil\b/i',
'/\bla\b/i',
'/\ble\b/i',
'/\b'. htmlentities('à') .'\b/i'
);
print_r($needles);
$result=preg_replace($needles, "", htmlentities("il y à trois personne dans la salle à manger"));
print_r($result);
The output removes everything but not the french character: à
As it has been noted in the comments, htmlentities('à')
will give you [3] => /\bà\b/i
. It won't match your letter.
Instead, use the à
with u
flag to enable Unicode in the pattern:
'/\bà\b/iu'
See demo
<?php
$needles=array(
'/\bil\b/i',
'/\bla\b/i',
'/\ble\b/i',
'/\bà\b/iu'
);
print_r($needles);
$result=preg_replace($needles, "", "il y à trois personne dans la salle à manger");
print_r($result);
Output:
y trois personne dans salle manger