如何从示例中获取页码(使用PHP)

I have different versions of filenames.

How can i get 123.pdf, 124.pdf and 125.pdf from it? The length of filenames can vary, 14-5678 is not relevant for this time and should be ignored.

  • 14-5678_jobname_0123_.p1.PDF
  • 14-5678_jobname_0123_.p2.PDF
  • 14-5678_jobname_0125_.p1.PDF
  • Weired_filename_0123_bla_14-5678_jobname.p1.PDF
  • Weired_filename_0123_bla_14-5678_jobname.p2.PDF
  • Weired_filename_0125_bla_14-5678_jobname.p1.PDF
  • 14-5678_jobname_0123.p1.PDF
  • 14-5678_jobname_0123.p2.PDF
  • 14-5678_jobname_0125.p1.PDF
  • 0123_14-5678_jobname.p1.PDF
  • 0123_14-5678_jobname.p2.PDF
  • 0125_14-5678_jobname.p1.PDF
  • jobname_0123_14-5678.p1.PDF
  • jobname_0123_14-5678.p2.PDF
  • jobname_0125_14-5678.p1.PDF

Tried for hours with regexp testers, I'm now totally stuck. Would love some PHP-Code which can do this job.

You need to match a series of four numbers that are not preceded by a dash:

/[^-](\d{4})/

Decomposing the regex:

  • [^-]: not a dash
  • \d{4}: four digits
  • (\d{4}): capture the digits

You can then add .pdf to get your file name.

Example with preg_replace and the file names you've given above in an array:

foreach ($files as $f) {
    echo "$f => " . preg_replace("/.*?[^-]*(\d{4}).+/", "$1.pdf", $f) . PHP_EOL;
}

ETA: if you want to factor in the page number, you could use this code:

foreach ($files as $f) {
    # this saves the four digits of the PDF name, and the number in p1/p2
    preg_match("/.*?[^-]*(\d{4}).*?p(\d+)\.pdf/i", $f, $matches);
    # if the number (from p1/p2) is greater than 1, add it to the PDF name number
    if ($matches[2] > 1) {
        $matches[1] += $matches[2] - 1;
    }
    # format the pdf name to be four digits long, with zero padding for shorter names
    echo "$f => " . sprintf('%04d.pdf',  $matches[1]) . PHP_EOL;
}

Output:

14-5678_jobname_0123_.p1.PDF => 0123.pdf
14-5678_jobname_0123_.p2.PDF => 0124.pdf
14-5678_jobname_0125_.p1.PDF => 0125.pdf
Weired_filename_0123_bla_14-5678_jobname.p1.PDF => 0123.pdf
Weired_filename_0123_bla_14-5678_jobname.p2.PDF => 0124.pdf
Weired_filename_0125_bla_14-5678_jobname.p1.PDF => 0125.pdf
14-5678_jobname_0123.p1.PDF => 0123.pdf
14-5678_jobname_0123.p2.PDF => 0124.pdf
14-5678_jobname_0125.p1.PDF => 0125.pdf
0123_14-5678_jobname.p1.PDF => 0123.pdf
0123_14-5678_jobname.p2.PDF => 0124.pdf
0125_14-5678_jobname.p1.PDF => 0125.pdf
jobname_0123_14-5678.p1.PDF => 0123.pdf
jobname_0123_14-5678.p2.PDF => 0124.pdf
jobname_0125_14-5678.p1.PDF => 0125.pdf