I try to get some word from string, but this word maybe will have cyrillic characters, I try to get it, but all what I to do - not working.
Please help me; My code
$str= "Продавец:В KrossАдын рассказать друзьям var addthis_config = {'data_track_clickback':true};";
$pattern = '/\s(\w*|.*?)\s/';
preg_match($pattern, $str, $matches);
echo $matches[0];
I need to get KrossАдын.
Thaks!
The issue is that your string uses UTF-8 characters, which \w will not match. Check this answer on StackOverflow for a solution: UTF-8 in PHP regular expressions
Essentially, you'll want to add the u
modifier at the end of your expression, and use \p{L}
instead of \w
.
You can change the meaning of \w
by using the u modifier. With the u modifier, the string is read as an UTF8 string, and the \w
character class is no more [a-zA-Z0-9_]
but [\p{L}\p{N}_]
:
$pattern = '/\s(\w*|.*?)\s/u';
Note that the alternation in the pattern is a non-sense:
you use an alternation where the second member can match the same thing than the first. (i.e. all that is matched by \w*
can be matched by .*?
because there is a whitespace on the right. The two subpatterns will match the characters between two whitespaces)
Writting $pattern = '/\s(.*?)\s/u';
does exactly the same, or better:
$pattern = '/\s(\S*)\s/u';
that avoids to use a lazy quantifier.
If your goal is only to match ASCII and cyrillic letters, the most efficient (because for character classes the smaller is the faster) will be:
$pattern = '~(*UTF8)[a-z\p{Cyrillic}]+~i';
(*UTF8)
will inform the regex engine that the original string must be read as an UTF8 string.
\p{Cyrillic}
is a character class that only contains cyrillic letters.