Can you please explain how this line of code is equivalent to the next code:
<?php
$string = chr( ( $number >> 6 ) + 192 ).chr( ( $number & 63 ) + 128 );
?>
Its equivalent to :
if ( $number >=128 && $number <=2047 ){
$byte1 = 192 + (int)($number / 64); //= 192 + ( $number >> 6 )
$byte2 = 128 + ($number % 64); //= 128 + ( $number & 63 )
$utf = chr($byte1).chr($byte2);
}
for example entering number 1989 both produces ߅
These codes are used for converting UNICODE Entities back to original UTF-8 characters.
The code on top uses binary operators. >>
is right shift operator. It shifts the bit in the number to the right (towards more significant bits).
So 11110000 >> 2 = 00111100
It's equivalent to division by powers of 2 $number >> $n
is the same as $number / pow(2,$n)
.
The &
is the "bitwise and" operator. It compares respective bits on both numbers, and sets in result those, that are 1
in both numbers.
11110000 & 01010101 = 01010000
By and'ing $number
with 63 (001111111
) you get the remainder of dividing $number
by 64 (aka the modulus), which is written $number % 64
.
$number >> 6
is a binary shift-right operation, ie: 11000000 >> 6 == 00000011
equivalent to $number / pow(2,6)
aka $number / 64
$number & 63
is a binary AND
with 00111111
Both are much faster to do as binary operations since they deal with powers or two.
Adding to @Mchl's answer the reason for adding 192 in UTF sequence is to signal the start of byte information
192 - 11000000 - Start of 2 Byte sequence ( 128 + 64)
224 - 11100000 - Start of 3 Byte sequence ( 128 + 64 + 32)
240 - 11110000 - Start of 4 Byte sequence ( 128 + 64 + 32 + 16)
248 - 11111000 - Start of 5 Byte sequence (Restricted) (... + 8)
252 - 11111100 - Start of 6 Byte sequence (Restricted) (... + 4)
254 - 11111110 - Invalid
Table Reference : https://en.wikipedia.org/w/index.php?title=UTF-8&oldid=388157043