Suppose I have the following line:
1309270927C1642,61N654NONREF
Now I want to get the C
or D
after the first digits. Now there are a few rules here
D
or a C
.Now I wanted to solve that with a look behind:
/(?<=\d{6,10})D|C/
but that is not allowed in PHP.
So I tried a non capturing group /(?:\d{6,10})D|C/
. But that captures 1309270927C
in stead of just C
.
So my question is how can I just capture the D
or a C
?
You can use PCRE \K
operator:
\d{6,10}\K[DC]
It will omit everything in the match up to D
or C
. You may further tweak this regex allowing or disallowing more characters to the character class [DC]
.
Have a look at the example.
Sample code:
$re = "/\\d{6,10}\\K[DC]/";
$str = "1309270927C1642,61N654NONREF";
preg_match_all($re, $str, $matches);
Also, here is some more information on \K
operator:
The \K "keep out" verb, which is available in Perl, PCRE (C, PHP, R…)
and Ruby 2+. \K tells the engine to drop whatever it has matched so
far from the match to be returned.
Instead of (?<=\b\d+_)[A-Z]+, you can therefore use \b\d+_\K[A-Z]+
The limitations of \K
:
Compared with lookbehinds, both the \K and capture group workarounds have limitations:
✽ When you look for multiple matches in a string, at the starting position of each match attempt, a lookbehind can inspect the characters behind the current position in the string. Therefore, against 123, the pattern (?<=\d)\d (match a digit preceded by a digit) will match both 2 and 3. In contrast, \d\K\d can only match 2, as the starting position after the first match is immediately before the 3, and there are not enough digits left for a second match. Likewise, \d(\d) can only capture 2.
✽ With lookbehinds, you can impose multiple conditions (similar to our password validation technique) by using multiple lookbehinds. For instance, to match a digit that is preceded by a lower-case Greek letter, you can use (?<=\p{Ll})(?<=\p{Greek})\d. The first lookbehind (?<=\p{Ll}) ensures that the character immediately to the left is a lower-case letter, and the second lookbehind (?<=\p{Greek}) ensures that the character immediately to the left belongs to the Greek script. With the workarounds, you could use \p{Greek}\K\d to match a digit preceded by a character in the Greek script (or \p{Greek}(\d) to capture it), but you cannot impose a second condition. To get over this limitation, you could capture the Greek character and use a second regex to check that it is a lower-case letter.
Output:
C
I would use a capturing subpattern, like this:
$string = "1309270927C1642,61N654NONREF";
$pattern = '/\d{6,10}(C|D)/';
preg_match($pattern, $string, $matches);
// $matches[1] contains the contents of the first subpattern
echo $matches[1];