I'm having a problem where I'm trying to parse an email, and then post the email content to a website. The email may contain Japanese or English. The Japanese displays 99% correctly on the website, but every now and then a character will be swapped for another, or it will display as garbage.
Here's the code being used to get the proper encoding for the email body-
$post->content = quoted_printable_decode($parser->getMessageBody('text'));
$isISO2022 = $parser->isISO2022();
$post->content = ($isISO2022)
? mb_convert_encoding($post->content, 'UTF-8', 'iso-2022-jp')
: mb_convert_encoding($post->content, 'UTF-8', mb_detect_encoding($post->content));
$post->save();
The parser's isISO2022 function:
public function isISO2022() {
$isISO2022 = false;
foreach ($this->parts as $part) {
if (isset($part['headers']['content-type']) && preg_match('/iso-2022-jp/i',$part['headers']['content-type'])) {
$isISO2022 = true;
}
}
return $isISO2022;
}
Anyone have any ideas what's going on?
Added: I have heard that there are some specific characters that are not supported by iso-2022-jp, and you should use iso-2022-jp-ms instead, but when I try to use iso-2022-jp-ms, it says invalid encoding. It also seems to me that the characters I've seen it not display correctly are basic characters, and should be universally supported.