Using Ubuntu with php I'm facing a common problem, to which I haven't found any solution. I'm uploading a pdf file that I convert into text file (using ImgMagick + Tesseract).
$output = shell_exec('convert -density 300 ' . $fichier . ' ' . $fichier_noExt . '.png');
$output = shell_exec('tesseract ' . $fichier_noExt . '.png ' . $fichier_noExt . '.txt');
As I do this :
$file = fopen($fichier_txt.'.txt', 'r+');
echo $file;
I get some '°' instead of '°', '€ ' instead of '€' and 'é' instead of 'é'. I know it's an encoding issue, but I can't locate it.
Oh dear...
I just forgot to add this on top of my file :
header('Content-Type: text/html; charset=utf-8');
It does work now, sorry for losing your time, but I needed some fresh look :).
Have a nice day and cya !
If you want to print an UTF-8 string result, you can try this :
$file = fopen($fichier_txt.'.txt', 'r+');
while(!feof($file)){
echo mb_convert_encoding(fread($file, 1024), 'UTF-8', mb_detect_encoding($file));
}
fclose($file);
Documentation :
http://php.net/manual/fr/function.mb-convert-encoding.php
http://php.net/manual/fr/function.mb-detect-encoding.php
Also you can use dos2unix and mac2unix to convert file , use this custom function :
function convertFiles($file) { // pass complete path to file
if (shell_exec("dos2unix $file") !== FALSE) {
if (shell_exec("mac2unix $file") !== FALSE) {
return TRUE;
}
else {
return FALSE;
}
}
else {
return FALSE;
}
}
you can install thoses command with apt-get install http://xmodulo.com/how-to-install-dos2unix-on-linux.html
Finally if you display it on webpage, don't forget to set meta charset content type :
header('Content-Type: text/html; charset=utf-8');
or html version
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />