非ASCII字符转换为正方形

I've got the following code which searches a string for Non ASCII characters and returns it via an AJAX query.

$asciistring = $strDescription;
for ($i=0; $i<strlen($asciistring); $i++) {  
    if (ord($asciistring[$i]) > 127){
        $display_string .= $asciistring[$i];
    }
}

If $strDescription contains £ (character # 156) the above code works fine. However, I want to separate each Non ASCII character found with a comma. When I modify my code below, it converts the £ character into squares.

$asciistring = $strDescription;
for ($i=0; $i<strlen($asciistring); $i++) {  
    if (ord($asciistring[$i]) > 127){
        $display_string .= $asciistring[$i] . ", ";
    }
}

What am I doing wrong and how do I fix it?

I provide you two way. At first use utf8_decode. You can try these

$asciistring = 'a£bÂc£d';
$asciistring =  utf8_decode($asciistring);

First way preg_match_all

if (preg_match_all('/[\x80-\xFF]/', $asciistring, $matches)) {
    $display_string = implode(',', $matches[0]);
}

2nd way as you wrote

$display_string = array();
for ($i=0; $i<strlen($asciistring); $i++) {
    if (ord($asciistring[$i]) > 127)
    {
        $display_string[] = $asciistring[$i];
    }
}
$display_string = implode(',', $display_string);

Both give me the same output

£,Â,£

I think you will be helpful!

You assume 1 character = 1 byte.

This assumption is wrong when it comes to UTF-8 / UTF-16 etc.

UTF-8 e.a. consist of multi-byte chars: 1 character = 1 to 3 bytes.

So, your loop over 8-bit-bytes can not handle any UTF-8 chars.

Use the mb_... functions instead - multibyte string functions.

Additionaly: converting ASCII to UTF-8 and vice versa is

  1. in general not needed
  2. will always result in certain characters not available in either encoding (i.e. the € sign is one of them)
  3. will be a maintenance nightmare on the long run

My recommendation: it's worth the effort to switch all and everything from dev to production to entirely use UTF-8. All problems are gone afterwards.