I must delete words if in word have letters "ц", "щ", "ы", "ь". I create this functions for need me but it works slow.
public function CheckToInsert($text)
{
$xarfho = array("ц", "щ", "ы", "ь","қ","ӣ","ғ","ҷ","ҳ","ӯ","Қ","Ӣ","Ғ","Ҷ","Ҳ","Ӯ");
foreach ($xarfho as $xarf)
{
if (stripos($text,$xarf) !== false)
{
return true;
}
}
return false;
}
public function UnsetUncorrectWords($words)
{
foreach ($words as $key => $value)
{
if($this->CheckToInsert($value) == false) unset($words[$key]);
if(strlen($value) < 3) unset($words[$key]);
}
return $words;
}
You may use preg_grep
to get either the array items that contain a regex match, or those that do not contain a match with a PREG_GREP_INVERT
flag.
So, to get all the items that have no letters of your choice, use
$xarfho = array("ц", "щ", "ы", "ь","қ","ӣ","ғ","ҷ","ҳ","ӯ","Қ","Ӣ","Ғ","Ҷ","Ҳ","Ӯ");
$wrds = array('Еыфвҷ','цӣвееп','аааа');
$pat = '/[' . implode("", $xarfho) . ']/u';
$res = preg_grep($pat, $wrds, PREG_GREP_INVERT);
// => Array ( [2] => аааа )
See the PHP demo
To get the items with the "ц", "щ", "ы", "ь","қ","ӣ","ғ","ҷ","ҳ","ӯ","Қ","Ӣ","Ғ","Ҷ","Ҳ","Ӯ"
letters, use
$xarfho = array("ц", "щ", "ы", "ь","қ","ӣ","ғ","ҷ","ҳ","ӯ","Қ","Ӣ","Ғ","Ҷ","Ҳ","Ӯ");
$wrds = array('Еыфвҷ','цӣвееп','аааа');
$pat = '/[' . implode("", $xarfho) . ']/u';
$res = preg_grep($pat, $wrds);
// => Array ( [0] => Еыфвҷ [1] => цӣвееп )
See another PHP demo.
The regexps will look like /[цщы]/u
where [...]
is a character class that matches any char (or range of chars) defined in the pattern and the /u
modifier is required since your pattern contains characters other than ASCII and the UNICODE modifier will make the regex engine correctly parse both the pattern and input strings.
I suggest to rewrite your function (or to not use a function at all) like that:
public function UnsetUncorrectWords($words)
{
return preg_grep('~\A[^қӣғҷҳӯҚӢҒҶҲӮ]{3,}\z~u', $words);
}
preg_grep
filters array items that don't match the pattern.
The pattern describes words with at least 3 characters written without the letters қ,ӣ,ғ,ҷ,ҳ,ӯ,Қ,Ӣ,Ғ,Ҷ,Ҳ,Ӯ.
Note that you can't use strlen
with multibyte characters since this one returns the number of bytes, not the number of characters.