希伯来语正则表达式匹配不在PHP中工作

this is my current regex code to validate english & numbers:

const CANONICAL_FMT = '[0-9a-z]{1,64}';

public static function isCanonical($str)
{
    return preg_match('/^(?:' . self::CANONICAL_FMT . ')$/', $str);

}

Pretty straight forward. Now i want to change that to validate only hebrew, underscore and numbers. So i changed the code to:

public static function isCanonical($str)
{
    return preg_match('/^(?:[\u0590-\u05FF\uFB1D-\uFB40]+|[\w]+)$/i', $str);

} 

But it doesn't work. I basically took the hebrew UTF range out of Wikipedia. What is Wrong here?

I was able to get it to work much more easily, using the /u flag and the \p{Hebrew} Unicode character property:

return preg_match('/^(?:\p{Hebrew}+|\w+)$/iu', $str);

Working example: http://ideone.com/gSlmh

If you want preg_match() to work properly with UTF-8, you might have to enable the u modifier (quoting) :

This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings are treated as UTF-8.


In your case, instead of using the following regex :

/^(?:[\u0590-\u05FF\uFB1D-\uFB40]+|[\w]+)$/i

I suppose you'd be using :

/^(?:[\u0590-\u05FF\uFB1D-\uFB40]+|[\w]+)$/iu

(Note the additionnal u at the end)

You need the /u modifier to add support for UTF-8.

Make sure you convert your hebrew input to UTF-8 if it's in some other codepage/character set.