I am new to encoding so please be patient. I am working on a system where a user upload a csv, what i need to do is to display the content and then save it in the database. (utf-8 encoding)
I have been asked to fix a issue with some french alphabet characters that weren't displayed correctly. I have almost solved the problem, I am displaying characters such as
ÀàÂâÆÄäÇçÉéÈèÊêËëÎîÏïÔôœÖöÙùÛûÜüÿ
However the two mentioned in the title Ÿ
Œ
are not displayed correctly yet on the webpage.
Here is my php code so far:
// say in the csv we have "ÖüÜߟÀàÂ"
$content = file_get_contents(addslashes($file_name));
var_dump($content) // output: string(54) "���ߟ��� "
if(!mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true)){
$data = iconv('macintosh', 'UTF-8', $content);
}
// deal with known encoding types
else if(mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true) == 'ISO-8859-1'){
//$data = mb_convert_encoding($content, 'UTF-8', mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true)); // does not work
$data = iconv('ISO-8859-1', 'UTF-8', $content); //does not work
}else if(mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true) == 'UTF-8'){
$data = $content
}
//if i print $data "Ÿ Œ " are not printed out... they got lost somewhere
//do more stuff here
the file I am dealing with has an encoding type of ISO-8859-1
(when i print out mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true)
it displays ISO-8859-1
).
Is there anyone that have an idea on how to deal with this special cases?
The characters Ÿ and Œ are not representable in ISO-8859-1. It seems that the incoming data is actually windows-1252 (Windows Latin 1) encoded, since windows-1252 has graphic characters, including Ÿ and Œ, in some code positions that are reserved for control characters in ISO-8859-1.
So you should probably add windows-1252 to the list of recognized encodings and treat recognized ISO-8859-1 as windows-1252, i.e use iconv('windows-1252', 'UTF-8', $content)
even when ISO-8859-1 has bee recognized. Windows-1252 data mislabeled as ISO-8859-1 is very common.