this is my current regex code to validate english & numbers:
const CANONICAL_FMT = '[0-9a-z]{1,64}';
public static function isCanonical($str)
{
return preg_match('/^(?:' . self::CANONICAL_FMT . ')$/', $str);
}
Pretty straight forward. Now i want to change that to validate only hebrew, underscore and numbers. So i changed the code to:
public static function isCanonical($str)
{
return preg_match('/^(?:[\u0590-\u05FF\uFB1D-\uFB40]+|[\w]+)$/i', $str);
}
But it doesn't work. I basically took the hebrew UTF range out of Wikipedia. What is Wrong here?
I was able to get it to work much more easily, using the /u
flag and the \p{Hebrew}
Unicode character property:
return preg_match('/^(?:\p{Hebrew}+|\w+)$/iu', $str);
Working example: http://ideone.com/gSlmh
If you want preg_match()
to work properly with UTF-8, you might have to enable the u
modifier (quoting) :
This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings are treated as UTF-8.
In your case, instead of using the following regex :
/^(?:[\u0590-\u05FF\uFB1D-\uFB40]+|[\w]+)$/i
I suppose you'd be using :
/^(?:[\u0590-\u05FF\uFB1D-\uFB40]+|[\w]+)$/iu
(Note the additionnal u
at the end)
You need the /u modifier to add support for UTF-8.
Make sure you convert your hebrew input to UTF-8 if it's in some other codepage/character set.