i am using this code for conversion pdf to text, it is working fine but it is not support swedish character, like:
correect swedish word = incorrect word
Förnamn = Fšrnamn,
Försäljningsdatum = FšrsŠljningsdatum,
varumärket = varumŠrket,
terförsäljaruppgifter = terfšrsŠljaruppgifter
code is:
<?php
require_once "pdf.pdf2text.inc";
$filename = "customerfile.pdf";
$pdf = new Pdf(urldecode($filename));
print utf8_decode($pdf->getText());//with utf-8
print $pdf->getText(); //without utf-8
?>
i am added utf-8 encoded/decoded but its not working. using this code
please anybody help me or suggest me to show proper text (Words) using this code.
thanks in advance.
iconv();
might be a possibility http://php.net/manual/fr/function.utf8-decode.php
$myUnicodeString = "Åäö"; echo iconv("UTF-8", "ISO-8859-1", $myUnicodeString);
as some comments say UTF-8_decode();
is not enough to handle accents.
According to a comment on Drupal.org from Saubhagya:
add the octal and unicode equivalents of desired characters in array $_pdfDocToUni line 18 file initialize.pdf2text.inc (remember octal need to be in 3 digits as in other entries of array).
Then just go to line 335 of pdf2text.module and add your character in the same format of other ones.
https://www.drupal.org/node/1079780
Not sure about the use of the word "just" but it might be a help...
This appears to be the module he is talking about and it does have the array he mentioned - perhaps your version may have modules missing - there seem to be a lot of them on offer
http://cgit.drupalcode.org/pdf2text/tree/pdf2text.module?id=a15059bc1531aa336fef255397ba362c81c9fce5