除了正确的匹配之外,PHP preg_match_all给出的偏移量为-1

This appears to be strange behavior, or perhaps I don't understand regular expressions so well...

I'm using this to find all the xref and trailer objects in a PDF file:

preg_match_all('@(
xref?
)|(\strailer\s)@',$pdfcontent,$matches,PREG_OFFSET_CAPTURE);

print_r gives me this:

Array
(
    [0] => Array
        (
            [0] => Array
                (
                    [0] =>
xref
                    [1] => 13235519
                )

            [1] => Array
                (
                    [0] =>
trailer
                    [1] => 13299371
                )
        )

    [1] => Array
        (
            [0] => Array
                (
                    [0] =>
xref
                    [1] => 13235519
                )

            [1] => Array
                (
                    [0] =>
                    [1] => -1
                )
        )

    [2] => Array
        (
            [0] =>
            [1] => Array
                (
                    [0] =>
trailer
                    [1] => 13299371
                )
        )
)

Why is there a position of -1 for xref?

It seems this is the normal behaviour, mostly undocumented though. The -1 offset is also used for absent matches.

To answer your title, the -1 offset is returned alternatively, not in addition. You have an alternative (a)|(b) match group in your pattern. So it can very well return offsets and matches for the xref, but a non-match for the trailer.

This is not mentioned explicitely in the PHP manual page. But PCRE documents it cursorily with:

[...] When this happens, both values in the offset pairs corre- sponding to unused subpatterns are set to -1.

You can reproduce it with a simpler example:

preg_match_all('/(a)|(b)|(c)/', "abc", $m, PREG_OFFSET_CAPTURE)
and print_r($m);

[Have a look]. The behaviour is a bit confusing. It seems the -1 is used as offset for the early non-matches. But subsequent failed matches are just absent in the result array. This example gives [0,-1,-1] and [undef,1,-1] and [undef,undef,2] for example. I would conclude it's some hazy behaviour in the PHP wrapper.

It seems to me you have 2 xref without a trailer in between. Something like:

xref
shgfjqhfkj

xref
 shgfjqhfkj
 trailer 

And the matching groups are wrong.

I'd change the regex with:

'@(
xref?
|\strailer\s)@'