I have a regex that matches 2 patterns with 1 pattern in the front or back, but the first array return 2 empty indexes. Why is it doing that, and how would I stop it from doing that?
$text = "i did";
preg_match("~(?:(did) (.+)|(.+) (did))~", $text, $match);
print_r($match);
echo "<br>";
$text = "did i";
preg_match("~(?:(did) (.+)|(.+) (did))~", $text, $match);
print_r($match);
Result:
Array ( [0] => i did [1] => [2] => [3] => i [4] => did )
Array ( [0] => did i [1] => did [2] => i )
Want Result:
Array ( [0] => i did [1] => i [2] => did )
Array ( [0] => did i [1] => did [2] => i )
You can use a branch reset (?|...)
:
Alternatives inside a branch reset group share the same capturing groups. The syntax is
(?|regex)
where(?|
opens the group and regex is any regular expression.
Your preg_match
will look like:
preg_match("~(?|(did) (.+)|(.+) (did))~", $text, $match);
See IDEONE demo
Results:
Array
(
[0] => i did
[1] => i
[2] => did
)
I guess your regex is a sample one. If you need to match a word after or before did
, use the \w
shorthand class:
preg_match("~(?|(did) (\w+)|(\w+) (did))~", $text, $match);
See another demo
This is a modified version that behaves as you wish:
$text1 = "i did";
preg_match("~(did|\w+(?= did)) (did|(?<=did )\w+)~", $text1, $match1);
print_r($match1);
$text2 = "did i";
preg_match("~(did|\w+(?= did)) (did|(?<=did )\w+)~", $text2, $match2);
print_r($match2);
$text3 = "did x, x did";
preg_match_all("~(did|\w+(?= did)) (did|(?<=did )\w+)~", $text3, $match3);
print_r($match3);
$text4 = "a a";
preg_match("~(did|\w+(?= did)) (did|(?<=did )\w+)~", $text4, $match4);
print_r($match4);
An online version here
Note: the regex takes advantage of the behaviour of OR in regex, the first results matched stop the regex engine to go further.