从字符串中提取url,之间没有空格

Let's say I have a string like this:

$urlsString = "http://foo.com/barhttps://bar.com//foo.com/foo/bar"

and I want to get an array like this:

array(
    [0] => "http://foo.com/bar",
    [1] => "https://bar.com",
    [0] => "//foo.com/foo/bar"
);

I'm looking to something like:

preg_split("~((https?:)?//)~", $urlsString, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);

Where PREG_SPLIT_DELIM_CAPTURE definition is:

If this flag is set, parenthesized expression in the delimiter pattern will be captured and returned as well.

That said, the above preg_split returns:

array (size=3)
  0 => string '' (length=0)
  1 => string 'foo.com/bar' (length=11)
  2 => string 'bar.com//foo.com/foo/bar' (length=24)

Any idea of what I'm doing wrong or any other idea?

PS: I was using this regex until I've realized that it doesn't cover this case.

Edit:

As @sidyll pointed, I'm missing the $limit in the preg_split parameters. Anyway, there is something wrong with my regex, so I will use @WiktorStribiżew suggestion.

You may use a preg_match_all with the following regex:

'~(?:https?:)?//.*?(?=$|(?:https?:)?//)~'

See the regex demo.

Details:

  • (?:https?:)? - https: or http:, optional (1 or 0 times)
  • // - double /
  • .*? - any 0+ chars other than line break as few as possible up to the first
  • (?=$|(?:https?:)?//) - either of the two:
    • $ - end of string
    • (?:https?:)?// - https: or http:, optional (1 or 0 times), followed with a double /

Below is a PHP demo:

$urlsString = "http://foo.com/barhttps://bar.com//foo.com/foo/bar";
preg_match_all('~(?:https?:)?//.*?(?=$|(?:https?:)?//)~', $urlsString, $urls);
print_r($urls);
// => Array ( [0] => http://foo.com/bar [1] => https://bar.com [2] => //foo.com/foo/bar )