Let's say I have a string like this:
$urlsString = "http://foo.com/barhttps://bar.com//foo.com/foo/bar"
and I want to get an array like this:
array(
[0] => "http://foo.com/bar",
[1] => "https://bar.com",
[0] => "//foo.com/foo/bar"
);
I'm looking to something like:
preg_split("~((https?:)?//)~", $urlsString, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
Where PREG_SPLIT_DELIM_CAPTURE definition is:
If this flag is set, parenthesized expression in the delimiter pattern will be captured and returned as well.
That said, the above preg_split
returns:
array (size=3)
0 => string '' (length=0)
1 => string 'foo.com/bar' (length=11)
2 => string 'bar.com//foo.com/foo/bar' (length=24)
Any idea of what I'm doing wrong or any other idea?
PS: I was using this regex until I've realized that it doesn't cover this case.
Edit:
As @sidyll pointed, I'm missing the $limit
in the preg_split
parameters. Anyway, there is something wrong with my regex, so I will use @WiktorStribiżew suggestion.
You may use a preg_match_all
with the following regex:
'~(?:https?:)?//.*?(?=$|(?:https?:)?//)~'
See the regex demo.
Details:
(?:https?:)?
- https:
or http:
, optional (1 or 0 times)//
- double /
.*?
- any 0+ chars other than line break as few as possible up to the first(?=$|(?:https?:)?//)
- either of the two:$
- end of string(?:https?:)?//
- https:
or http:
, optional (1 or 0 times), followed with a double /
Below is a PHP demo:
$urlsString = "http://foo.com/barhttps://bar.com//foo.com/foo/bar";
preg_match_all('~(?:https?:)?//.*?(?=$|(?:https?:)?//)~', $urlsString, $urls);
print_r($urls);
// => Array ( [0] => http://foo.com/bar [1] => https://bar.com [2] => //foo.com/foo/bar )