PHP编码与pdf文件的问题

Using Ubuntu with php I'm facing a common problem, to which I haven't found any solution. I'm uploading a pdf file that I convert into text file (using ImgMagick + Tesseract).

    $output = shell_exec('convert -density 300 ' . $fichier . ' ' . $fichier_noExt . '.png');
    $output = shell_exec('tesseract ' . $fichier_noExt . '.png ' . $fichier_noExt . '.txt');

As I do this :

$file = fopen($fichier_txt.'.txt', 'r+');
echo $file;

I get some '°' instead of '°', '€ ' instead of '€' and 'é' instead of 'é'. I know it's an encoding issue, but I can't locate it.

Oh dear...

I just forgot to add this on top of my file :

header('Content-Type: text/html; charset=utf-8');

It does work now, sorry for losing your time, but I needed some fresh look :).

Have a nice day and cya !

If you want to print an UTF-8 string result, you can try this :

$file = fopen($fichier_txt.'.txt', 'r+');
while(!feof($file)){
 echo mb_convert_encoding(fread($file, 1024), 'UTF-8', mb_detect_encoding($file));
}
fclose($file);

Documentation :

http://php.net/manual/fr/function.mb-convert-encoding.php

http://php.net/manual/fr/function.mb-detect-encoding.php

Also you can use dos2unix and mac2unix to convert file , use this custom function :

function convertFiles($file) { // pass complete path to file
    if (shell_exec("dos2unix $file") !== FALSE) {
        if (shell_exec("mac2unix $file") !== FALSE) {
            return TRUE;
        }
        else {
            return FALSE;
        }
    }
    else {
        return FALSE;
    }
}

you can install thoses command with apt-get install http://xmodulo.com/how-to-install-dos2unix-on-linux.html

Finally if you display it on webpage, don't forget to set meta charset content type :

header('Content-Type: text/html; charset=utf-8');

or html version

<meta http-equiv="Content-type" content="text/html; charset=utf-8" />