I have a following link structure:
/type1
/type2
/type3
those links correspond to the default language of the site. Unfortunately the client didn't want to add the default language in front of the URL for consistency, therefore only other languages will have URLs like:
/en
/en/type1
/de/type2
/de
/fr/type3
/fr
There are a lot of other variables but only this part is dynamic. My Regex starts as following:
(en|de|fr)?\/?(type1|type2|type3)?\/?
which basically means capture the language if exists, and then capture the page if exists. But it performs a lot more matches than required and also would capture empty string etc.
I'm trying to figure out how to capture all these options:
/en
/en/type1
/type1
in one expression, of course if possible. How can I make one of the two groups to be required, so basically the URL has either both or one of them but never none? I looked at backreferences in conjunction with look-aheads but I think I'm missing some crucial information here...
I would like to preserve the groups so that group1 = language
and group2 = page
I can't think of a way to do what you want with a single regex. But, another possibility would be to use a single regex to just match URL patterns which you want. Then, use a short PHP script to extract the language (if it exists) and page:
$path = "/de/type1";
if (preg_match("/^(?:\/(?:en|de|fr))?(?:\/(?:type1|type2|type3))?$/i", $path, $match)) {
$parts = preg_split("/\//", $path);
if (sizeof($parts) == 3) {
echo "language: " . $parts[1] . ", page: " . $parts[2];
}
else {
if (preg_match("/^(?:en|de|fr)$/i", $parts[1], $match)) {
echo "language: " . $parts[1] . ", page:";
}
else {
echo "language: default, page: " . $parts[1];
}
}
}
This is the pattern I used for matching:
^(?:/(?:en|de|fr))?(?:/(?:type1|type2|type3))?$
It allows for /(type1|type2|type3)
, optionally preceded by a language path.
This one will give you one or the other (whichever comes first), but doesn't require that if you provide both, they match (e.g. you could specify /en/type3, and it would give you /en):
<?php
$pat = '~(/(?:en|de|fr)\b|/type\d\b)~';
$test = ['/en', '/type1', '/en/type1', '/en/type3', '/english/type1'];
foreach ($test as $t) if (preg_match($pat, $t, $match)) echo "'{$t}' = '{$match[1]}'
";
?>
Which gives you:
'/en' = '/en'
'/type1' = '/type1'
'/en/type1' = '/en'
'/en/type3' = '/en'
'/english/type1' = '/type1'
(the last example is to demonstrate why you need the \b in the pattern)