I have kept filter which filter out some blocked words like ass
, jerk
etc. I have list of such words to compare and block if exist.
some users use them like je*k
, f*ck
so they are not matched with list and words appear on page. Is there any way to overcome this exploit?
similarly for shit
user uses sh/t
. how can I handle such issue?
This is a function that I use on my framework for comments ect, it loads a large list of filters words and uses word boundary's to replace with *
, in the word list I use *
as the wildcard letter so as not to replace words like chickenjerk, when i detect an exploit ill just add it to the list.
/**
* Swear word filtering function, requires a list of words,
* Second parameter reveals *n letters
*
* @param string $str
* @param int $reveal
* @return string
*/
function swear_filter($str, $reveal=null) {
//load words from file, triming any whitespace
//$words = join("|", array_filter(array_map('preg_quote',array_map('trim', file('./path/to/badwords.txt')))));
$words = 'ass|jerk|je*k|f*ck|sh\/t|sh*t*';//<< comment this out when you set path to word list
if($reveal !=null && is_numeric($reveal)){
return preg_replace("/\b($words)\b/uie", '"".substr("$1",0,'.$reveal.').str_repeat("*",strlen("$1")-'.$reveal.').""', $str);
}else{
return preg_replace("/\b($words)\b/uie", '"".str_repeat("*",strlen("$1")).""', $str);
}
}
//I like chickenjerk, you **** **** ***
echo swear_filter('I like chickenjerk, you jerk sh/t ass.');
//I like chickenjerk, you j*** s*** a**.
echo swear_filter('I like chickenjerk, you jerk sh/t ass.', 1); //with reveal
Hope it helps.
If you are checking individual words you can use levenshtein()
if (!ctype_alpha($text) && levenshtein('shit', $text) === 1) {
//match
}