can anyone suggest me a better method(or most preferred method) to find the match percentage between two strings(i.e. how closely those two strings(eg. name) are related in terms of percentage) using fuzzy logic.? can anyone help me to write the code? really i am wondering where to start..
$str1 = 'Hello';
$str2 = 'Hello, World!';
$percent;
similar_text($str1, $str2, $percentage);
I just wrote a string comparison function based on words, not characters - here it is, just in case anyone needs it:
function wordsof($s) {
$a = [];foreach(explode(" ",$s)as $w) $a[$w]++;
return $a;
}
function compare($s1,$s2) {
$w1 = wordsof($s1);if(!$w1) return 0;
$w2 = wordsof($s2);if(!$w2) return 0;
$totalLength = strlen(join("",$w1).join("",$w2)) || 1;
$chDiff = 0;
foreach($w1 as $word=>$x) if(!$w2[$word]) $chDiff+=strlen($word);
foreach($w2 as $word=>$x) if(!$w1[$word]) $chDiff+=strlen($word);
return $chDiff/$totalLength;
}
The logic is simple: it looks for each word of one string in the other, both ways. Long words weight more. It gives you a floating point value between 0 and 1. You may want to normalize strings before comparison - spaces trimmed, multiple spaces replaced by one, all lowercase, etc. Also, it's not very fast but it's not easy to optimize because of the word lookup thing...
If you don't want to pollute the global namespace, you can implement "wordsof" inside the comparator. It's separated for readability. Code has been somewhat simplified too so test it before you use it, but it should do the job. I'm using the original version right as we speak.