I am trying to match the first hexadecimal address from a line that can contain many hexadecimal addresses, but instead I get the last.
My request is:
preg_match('%.*(0x[0-9a-f]{8}){1}.*%', $v, $current_match);
where the $v
is a string like:
Line: 2 libdispatch.dylib 0x36eaed55 0x36eae000 + 3413
I would want to get 0x36eaed55
, but my regular expression for $current_match[1]
returns 0x36eae000
instead.
According to php documentation: $matches[1]
will have the text that matched the first captured parenthesized subpattern, and so on.
The problem is that the *
quantifier is greedy by default, so the first .*
matches as much as possible while still allowing the entire expression to match. In this case, it means that .*
will "gobble up" all of the hexadecimal constants but the last one, as (0x[0-9a-f]{8}){1}
still needs to match.
One solution is to use the non-greedy operator *?
. The first constant is found when using the following:
preg_match('%.*?(0x[0-9a-f]{8}){1}.*?%', $v, $current_match);
However, because you know that $v
includes a hexadecimal constant, and you want the first one, then why not simply match against the pattern of the hexadecimal constant?
preg_match('%0x[0-9a-f]{8}%', $v, $current_match);
Even if you wanted the second, third, fourth, ... hexadecimal constant, you could use preg_match_all()
with the same pattern:
preg_match_all('%0x[0-9a-f]{8}%', $v, $all_matches, PREG_PATTERN_ORDER);
The first .*
tries to match as much as possible, so it matches your first hex as well. Try making it not greedy: .*?
That's because your first .*
is greedy. You can fix it by changing your regexp to:
preg_match('%(0x[0-9a-f]{8})%', $v, $current_match);
or
preg_match('%.*?(0x[0-9a-f]{8})%', $v, $current_match);
You need to use the ungreedy modifier, "U":
preg_match('%.*(0x[0-9a-f]{8}){1}.*%U', $v, $m);