在PHP中从pdf中提取文本并不适用于所有PDF文件

I am extracting text from PDF files. this is the code:

<?php

require("PdfToText.php");

$file   =  'SamplePF' ;
$pdf    =  new PdfToText ( "$file.pdf" ) ;
echo ( $pdf -> Text ) ;

?>

This class work fine for some PDF files. The problem with this class is :

  1. for some PDF files it take text from random page/line not in the page sequence wise.
  2. for some PDF files it is not showing any result.
  3. for some PDF files it extract only one or two lines.

Please suggest some solution. Thank You!

I am not sure that this might be the exact problem because of which you are not able to extract but I also encountered something similar when extracting data from pdf. Sometimes the PDF files are locked by owner passwords which puts certain restrictions on the document and does not allow changing, content copying or extraction etc so as to protect its copyright issues. Check this link for more info on owner passwords.

So you can first try to remove owner password and then try to extract such pdf's. To remove owner passwords there are a number of tools available online, you can choose whichever fits you the best.