This is what I've found in the Kohana3 validator rules:
public static function digit($str, $utf8 = FALSE)
{
if ($utf8 === TRUE)
{
return (bool) preg_match('/^\pN++$/uD', $str);
}
else
{
return (is_int($str) AND $str >= 0) OR ctype_digit($str);
}
}
Can someone give an example when passing $utf8
parameter as true
and false
can give different results (to be precise - false positives for $utf8 == false
)?
From what I remember - digits are ascii-safe characters and none of utf-8 characters may be confused with them.
PS: even more detailed - is it possible to fool this check and pass something that in UTF-8 would look not like a number, but would pass the check with $utf-8 == false
Just gave your second question part a bit more alcohol, and my conclusion is that you can't hide an ASCII digit in a UTF-8 sequence. Digits must be 0x30
..0x39
or in the bitrange 00110000
..00110110
..00111001
.
UTF-8 encodings include prefixes such as
11110xxx 10xxxxxx 10xxxxxx
And therefore a digit ASCII representation can't match anywhere:
00110000
▲▲ 00110000 ▼
▲ 00110000
So it's impossible that it would match in Latin-1/ASCII mode, but also have \pN
satisfied in /u
mode. Ignoring invalid encodings of course.
Even though 0-9 are ASCII safe, there's a lot of other numbers in Unicode.
See Unicode Characters in the 'Number, Decimal Digit' Category
for a list. Some examples are U+0660 ARABIC-INDIC DIGIT ZERO (٠) and U+1D7EC MATHEMATICAL SANS-SERIF BOLD DIGIT ZERO (
...etc.