如何使用正则表达式获取数字后面的字符

Suppose I have the following line:

1309270927C1642,61N654NONREF

Now I want to get the C or D after the first digits. Now there are a few rules here

  1. The first 6 digits are always there
  2. The 4 digits after that are optional
  3. After that you have a D or a C.

Now I wanted to solve that with a look behind:

/(?<=\d{6,10})D|C/ but that is not allowed in PHP.

So I tried a non capturing group /(?:\d{6,10})D|C/. But that captures 1309270927C in stead of just C.

So my question is how can I just capture the D or a C?

You can use PCRE \K operator:

\d{6,10}\K[DC]

It will omit everything in the match up to D or C. You may further tweak this regex allowing or disallowing more characters to the character class [DC].

Have a look at the example.

Sample code:

$re = "/\\d{6,10}\\K[DC]/"; 
$str = "1309270927C1642,61N654NONREF"; 
preg_match_all($re, $str, $matches);

Also, here is some more information on \K operator:

The \K "keep out" verb, which is available in Perl, PCRE (C, PHP, R…) and Ruby 2+. \K tells the engine to drop whatever it has matched so far from the match to be returned.

Instead of (?<=\b\d+_)[A-Z]+, you can therefore use \b\d+_\K[A-Z]+

The limitations of \K:

Compared with lookbehinds, both the \K and capture group workarounds have limitations:

✽ When you look for multiple matches in a string, at the starting position of each match attempt, a lookbehind can inspect the characters behind the current position in the string. Therefore, against 123, the pattern (?<=\d)\d (match a digit preceded by a digit) will match both 2 and 3. In contrast, \d\K\d can only match 2, as the starting position after the first match is immediately before the 3, and there are not enough digits left for a second match. Likewise, \d(\d) can only capture 2.

✽ With lookbehinds, you can impose multiple conditions (similar to our password validation technique) by using multiple lookbehinds. For instance, to match a digit that is preceded by a lower-case Greek letter, you can use (?<=\p{Ll})(?<=\p{Greek})\d. The first lookbehind (?<=\p{Ll}) ensures that the character immediately to the left is a lower-case letter, and the second lookbehind (?<=\p{Greek}) ensures that the character immediately to the left belongs to the Greek script. With the workarounds, you could use \p{Greek}\K\d to match a digit preceded by a character in the Greek script (or \p{Greek}(\d) to capture it), but you cannot impose a second condition. To get over this limitation, you could capture the Greek character and use a second regex to check that it is a lower-case letter.

Output:

C

I would use a capturing subpattern, like this:

$string = "1309270927C1642,61N654NONREF";
$pattern = '/\d{6,10}(C|D)/';
preg_match($pattern, $string, $matches);
// $matches[1] contains the contents of the first subpattern
echo $matches[1];