Why the following regex: $regex = '/\b(V|E)?\d{1,2}? ?\d{3} ?\d{3}\b/i';
does not match all the input below
I did think that the this (V|E)?\d{1,2}? ?
would made optional the letters, the first one or two number and the first space
<?php
$sms = array(
'test test test 11 111 111 test test test',
'test test test 1 111 111 test test test',
'test test test 111 111 test test test', // does not match
'test test test test test test 11111111',
'test test test 1111111 test test test',
'test test test 111111 test test test', // does not match
'test test test E11 111 111 test test test',
'test test test V1 111 111 test test test',
'test test test V111 111 test test test', // does not match
'test test test V11111111 test test test',
'test test test V1111111 test test test',
'test test test E111111 test test test', // does not match
'test test test V 11 111 111 test test test',
'test test test V 1 111 111 test test test',
'test test test E 111 111 test test test', // does not match
'test test test V 11111111 test test test',
'test test test V 1111111 test test test',
'test test test V 111111 test test test', //does not match
'test test test V11 111 111 test test test',
'test test test V1 111 111 test test test',
'test test test E111 111 test test test', //does not match
'test test test V11111111 test test test',
'V1111111 test test test test test test',
'test test test V111111 test test test', // does not match
);
$regex = '/\b(V|E)?\d{1,2}? ?\d{3} ?\d{3}\b/i';
$noMatches = 0;
$index = 0;
foreach($sms as $v) {
$match = preg_match($regex, $v, $matches);
if($match) {
//print_r($matches);
//echo "$v match!
";
//$matches++;
}
else {
echo "$index - $v does NOT match!
";
$noMatches++;
}
$index++;
}
$total = count($sms);
echo "
Total: $total
No Matches: $noMatches
";
$ php test-regex.php
2 - test test test 111 111 test test test does NOT match!
5 - test test test 111111 test test test does NOT match!
8 - test test test V111 111 test test test does NOT match!
11 - test test test E111111 test test test does NOT match!
14 - test test test E 111 111 test test test does NOT match!
17 - test test test V 111111 test test test does NOT match!
20 - test test test E111 111 test test test does NOT match!
23 - test test test V111111 test test test does NOT match!
Total: 24
No Matches: 8
Using mario suggestion the regex is now $regex = '/\b(V|E)?\d{0,2} ?\d{3} ?\d{3}\b/i';
, why in some cases, this regex does not capture the letter V
or E
$output = array(
'test test test E11 111 111 test test test' => 'E11 111 111',
'test test test V1 111 111 test test test' => 'V1 111 111',
'test test test V111 111 test test test' => 'V111 111',
'test test test V11111111 test test test' => 'V11111111',
'test test test V1111111 test test test' => 'V1111111',
'test test test E111111 test test test' => 'E111111',
'test test test V 11 111 111 test test test' => '11 111 111', // Missing Letter
'test test test V 1 111 111 test test test' => '1 111 111', // Missing Leter
'test test test E 111 111 test test test' => 'E 111 111',
'test test test V 11111111 test test test' => '11111111', // Missing Letter
'test test test V 1111111 test test test' => '1111111', // Missing Letter
'test test test V 111111 test test test' => 'V 111111',
'test test test V11 111 111 test test test' => 'V11 111 111',
'test test test V1 111 111 test test test' => 'V1 111 111',
'test test test E111 111 test test test' => 'E111 111',
'test test test V11111111 test test test' => 'V11111111',
'V1111111 test test test test test test' => 'V1111111',
'test test test V111111 test test test' => 'V111111',
'V 1111111 test test test' => '1111111', // Missing Letter
'test test test V 1111111 test test test' => '1111111', // Missing Letter
);
?
only is a quantifier after groups or literal chars or characters classes e.g.
If ?
occurs after another quantifier *
or +
and {n,m}
it will just make the matching less greedy. Meaning the regex will try to match the least amount.
So \d{1,2}?
does not mean optional. It means match one or two, but prefer to match just one. You meant to write \d{0,2}
instead.
They don't match because the regex requires at least 7 digits in total:
/\b(V|E)?\d{1,2}? ?\d{3} ?\d{3}\b/
| | |
| | \--------> 3 digits exactly
| \---------------> 3 digits exactly
\------------------------> 1 or 2 digits (prefers 1, but will match
2 if there are 8 digits in a row)
All the failing inputs are one digit short.
If you want to make the first part optional entirely, you must enclose it in parenthesis and append a ?
to that. You can also use a character group for V|E
(?:[VE]\d{1,2} )?