I've got the following code which searches a string for Non ASCII characters and returns it via an AJAX query.
$asciistring = $strDescription;
for ($i=0; $i<strlen($asciistring); $i++) {
if (ord($asciistring[$i]) > 127){
$display_string .= $asciistring[$i];
}
}
If $strDescription contains £ (character # 156) the above code works fine. However, I want to separate each Non ASCII character found with a comma. When I modify my code below, it converts the £ character into squares.
$asciistring = $strDescription;
for ($i=0; $i<strlen($asciistring); $i++) {
if (ord($asciistring[$i]) > 127){
$display_string .= $asciistring[$i] . ", ";
}
}
What am I doing wrong and how do I fix it?
I provide you two way. At first use utf8_decode. You can try these
$asciistring = 'a£bÂc£d';
$asciistring = utf8_decode($asciistring);
First way preg_match_all
if (preg_match_all('/[\x80-\xFF]/', $asciistring, $matches)) {
$display_string = implode(',', $matches[0]);
}
2nd way as you wrote
$display_string = array();
for ($i=0; $i<strlen($asciistring); $i++) {
if (ord($asciistring[$i]) > 127)
{
$display_string[] = $asciistring[$i];
}
}
$display_string = implode(',', $display_string);
Both give me the same output
£,Â,£
I think you will be helpful!
You assume 1 character = 1 byte
.
This assumption is wrong when it comes to UTF-8 / UTF-16 etc.
UTF-8 e.a. consist of multi-byte chars: 1 character = 1 to 3 bytes
.
So, your loop over 8-bit-bytes can not handle any UTF-8 chars.
Use the mb_... functions instead - multibyte string functions.
Additionaly: converting ASCII to UTF-8 and vice versa is
My recommendation: it's worth the effort to switch all and everything from dev to production to entirely use UTF-8. All problems are gone afterwards.