My question is, given i have the following php code to compare two strings:
$cadena1='JUAN LÓPEZ YÁÑEZ';
$cadena2='JUAN LOPEZ YÁÑEZ';
if($cadena1===$cadena2){
echo '<p style="color: green;">The strings match!</p>';
}else{
echo '<p style="color: red;">The strings do not match. Accent sensitive?</p>';
}
I notice for example that if I compare LOPEZ and LÓPEZ then the comparison turns to false.
Is there a way or a function already there to compare theses strings regardless of the Spanish accents?
I would replace all accents in your strings before comparing them. You can do that using the following code:
$replacements = array('Ó'=>'O', 'Á'=>'A', 'Ñ' => 'N'); //Add the remaining Spanish accents.
$output = strtr("JUAN LÓPEZ YÁÑEZ",$replacements);
output
will now be equal to cadena2
.
You could try the soundex()
function, that works at least for your example:
var_dump(soundex('LOPEZ'));
// string(4) "L120"
var_dump(soundex('LÓPEZ'));
// string(4) "L120"
You would have to test that for different words and if the results are not good enough, you could try similar_text()
.
See an example with your code.
The two strings compare to false because they are actually different sequence of bytes. To compare them, you need to normalize them in any way.
The best way to do that is to use the Transliterator class, part of the intl
extension on PHP 5.4+.
A test code:
<?php
$transliterator = Transliterator::createFromRules(':: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;', Transliterator::FORWARD);
$test = ['abcd', 'èe', '€', 'àòùìéëü', 'àòùìéëü', 'tiësto'];
foreach($test as $e) {
$normalized = $transliterator->transliterate($e);
echo $e. ' --> '.$normalized."
";
}
?>
Result:
abcd --> abcd
èe --> ee
€ --> €
àòùìéëü --> aouieeu
àòùìéëü --> aouieeu
tiësto --> tiesto
(taken from my answer here: mySQL - matching latin (english) form input to utf8 (non-English) data )
This replaces characters according to the tables of the ICU library, which are extremely complete and well-tested. Before transliterating, this normalizes the string, so it matches any possible way to represent characters like ñ (ñ, for example, can be represented with 1 multibyte character or as the combination of the two characters ˜ and n).
Unlike using soundex(), which is also very resource-intense, this does not compare sounds, so it's more accurate.
Why not just use collations from intl extension, Collator class?
(and so on - see ICU or PHP documentation for details)
$cadena1 = 'JUAN LÓPEZ YÁÑEZ';
$cadena2 = 'JUAN LOPEZ YÁÑEZ';
$coll = new Collator('es_ES');
$coll->setStrength(Collator::PRIMARY);
//$coll->setAttribute(Collator::CASE_LEVEL, Collator::ON);
var_dump($coll->compare($cadena1, $cadena2)); // 0 = equals
(of course, the strings have to be UTF-8 encoded)
Try this function from http://sourcecookbook.com/en/recipes/8/function-to-slugify-strings-in-php. It will replace non-ASCII characters with ASCII characters in string.
$cadena1='JUAN LÓPEZ YÁÑEZ';
$cadena2='JUAN LOPEZ YÁÑEZ';
function slugify( $text ) {
// replace non letter or digits by -
$text = preg_replace('~[^\\pL\d]+~u', '-', $text);
$text = trim($text, '-');
/**
* //IGNORE//TRANSLIT to avoid errors on non translatable characters and still translate other characters
* //TRANSLIT to out_charset transliteration is activated
* //IGNORE, characters that cannot be represented in the target charset are silently discarded
*/
$text = iconv('utf-8', 'ASCII//IGNORE//TRANSLIT', $text);
$text = strtolower(trim($text));
// remove unwanted characters
$text = preg_replace('~[^-\w]+~', '', $text);
return empty($text) ? '' : $text;
}
var_dump( slugify( $cadena1 ) ); // string(16) "juan-lopez-yanez"
var_dump( slugify( $cadena2 ) ); // string(16) "juan-lopez-yanez"