I have a PHP script using Imagick, but there is the risk of a NAN error, should a PDF file provided by a user contain no pages or have a page with no height or no width. I am not sure if this is possible in a PDF structure. Also making a jpeg from a page number larger than the total pages will cause an error. Is it generally possible a valid PDF file wrapper is sent but without actual page content?
In the function below I assume it might be possible to have 0 height or 0 width. And use the code if($imH==0){$imH=1;} but having code based on an assumption doesn't feel right.
parts of the function were adopted from an article by umidjons: https://gist.github.com/umidjons/11037635
PHP code:
function genPdfThumbnail ( $src, $targ, $size=256, $page=1 ){
if(file_exists($src) && !is_dir($src)): // source path must be available and cannot be a directory
if(mime_content_type($src) != 'application/pdf'){return FALSE;} // source is not a pdf file returns a failure
$sepa = '/'; // using '/' as path separation for nfs on linux.
$targ = dirname($src).$sepa.$targ;
$size = intval($size); // only use as integer, default is 256
$page = intval($page); // only use as integer, default is 1
$page--; // default page 1, must be treated as 0 hereafter
if ($page<0){$page=0;} // we cannot have negative values
$img = new Imagick($src."[$page]");
$imH = $img->getImageHeight();
$imW = $img->getImageWidth();
if ($imH==0) {$imH=1;} // if the pdf page has no height use 1 instead
if ($imW==0) {$imW=1;} // if the pdf page has no width use 1 instead
$sizR = round($size*(min($imW,$imH)/max($imW,$imH))); // relative pixels of the shorter side
$img -> setImageColorspace(255); // prevent image colors from inverting
$img -> setImageBackgroundColor('white'); // set background color before flatten
$img = $img->flattenImages(); // prevent black zones on transparency in pdf
$img -> setimageformat('jpeg');
if ($imH == $imW){$img->thumbnailimage($size,$size);} // square page
if ($imH < $imW) {$img->thumbnailimage($size,$sizR);} // landscape page orientation
if ($imH > $imW) {$img->thumbnailimage($sizR,$size);} // portrait page orientation
if(!is_dir(dirname($targ))){mkdir(dirname($targ),0777,true);} // if not there make target directory
$img -> writeimage($targ);
$img -> clear();
$img -> destroy();
if(file_exists( $targ )){ return $targ; } // return the path to the new file for further processing
endif;
return FALSE; // source file not available or Imagick didn't create jpeg file, returns a failure
}
call the function e.g. like:
$newthumb = genPdfThumbnail('/nfs/vsp/server/u/user/public_html/any.pdf','thumbs/any.p01.jpg',150,'01');
Sure, a PDF file is a container format that can contain pretty much anything, including (only) metadata with 0 pages. But even so, with this code it's quite possible to request a thumbnail for page 21 on a document that only contains 5 pages.
If that happens, the problem will occur on this line:
$img = new Imagick($src."[$page]");
This will throw an exception if the provided page does not exist. You can catch that exception and handle it however you want:
try {
$img = new Imagick($src."[$page]");
} except (ImagickException $error) {
return false;
}
If you want to read the number of pages beforehand, you can try to let Imagick parse the document first:
$pdf = new Imagick($src);
$pages = $pdf->getNumberImages();
The function name is a bit misleading, see this comment in the PHP manual:
"For PDFs this function indicates the number of pages on the PDF, NOT images that might be embedded within the PDF."
Here as well, if the PDF document is invalid in some way, this can throw an exception so you might want to catch that and handle it:
try {
$pdf = new Imagick($src);
$pages = $pdf->getNumberImages();
} except (ImagickException $error) {
return false;
}
if ($pages < $page) {
return false;
}
A PDF needs to contain at least one page in its Page Tree so you can't have a zero page PDF that's valid. If you had such a PDF and your PDF software read it as valid and reported zero pages for it, then that software would be quite misleading.
Acrobat will display a dialog with an error message in such a case if I'm recalling correctly and I'd imagine most other PDF software would similarly complain.
A PDF's Page Boundaries are defined by Rectangles which in themselves don't have a limitation I can find in the specification about not allowing zero width and/or height. Though practically speaking it would be bizarre to have it and likely most PDF software would complain about it or trip over it.
You can certainly have a PDF page with no content, e.g. a blank 8.5x11" page is perfectly valid. You could use that or perhaps some text/image on the page indicating your error occurred to the user if you like.