UTF8非ASC事件列表无效[关闭]

The PHP library lacks a mb_ord() function... That is, something that do what ord() function do, but for UTF8 (or "mb" multibyte, so "mb_ord"). I used some clues from here,

 $ord = hexdec( bin2hex($utf8char) ); //decimal 

and I suppose that mb_substr($text, $i, 1, 'UTF-8') gets "1 utf8-char"... But $ord not returns the values that we expect.

CONTEXT

This code not works: not shows code like 177 (plusmn).

 $msg = '';
 $text = "... a UTF-8 long text... Ą ⨌ 2.5±0.1; 0.5±0.2 ...";
 $allOrds = array(); 
 for($i=0; $i<mb_strlen($text, 'UTF-8'); $i++) {
    $utf8char = mb_substr($text, $i, 1,  'UTF-8'); // 1=1 unicode character?
    $ord = hexdec( bin2hex($utf8char) ); //decimal 
    if ($ord>126) { //non-ASCII
      if (isset($allOrds[$ord])) $allOrds[$ord]++; else $allOrds[$ord]=1;
    }
 }
 foreach($allOrds as $o=>$n)
    $msg.="
 entity #$o occurs $n times";
 echo $msg;

OUTPUT

entity #50308 occurs 1 times
entity #14854284 occurs 1 times
entity #49841 occurs 2 times

So (see entities table), 49841 is not 177, and 14854284 (iiiint) is not 10764.

something that do what ord() function do, but for UTF8

For that you'd first need to define what exactly that is. ord gives you the numerical value of a byte. This is often confused as "value of the character", but since encodings are a complex topic that makes no sense. So, ord == numerical value of a byte. What would you expect the "MB version of ord" to do then exactly?

Anyway, what you're getting is the numeric value of two (or more) bytes. Say, the character "漢" in UTF-8 is encoded as the three bytes E6 BC A2. That's what bin2hex gives you. hexdec then translates that to decimal, which is a pretty large number. That number has absolutely nothing to do with the Unicode code point 6F22, which you're really after. That is because the UTF-8 encoding needs a few more extra bytes to encode this code point, hence U+6F22 (漢) does not translate into the bytes 6F 22.

You have already linked to another question which does what you want:

list(, $ord) = unpack('N', mb_convert_encoding($utf8Character, 'UCS-4BE', 'UTF-8'));

This essentially uses the same logic, but bases it on the UCS-4 encoding, in which code points happen to correspond to bytes quite nicely.