Preg_split匹配超过它应该的

Code:

    $pattern = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
    $urls = array();
    preg_match($pattern, $comment, $urls);

    return $urls;

According to an online regex tester, this regex is correct and should be working:

http://regexr.com?35nf9

I am outputting the $links array using:

$linkItems = $model->getLinksInComment($model->comments);
//die(print_r($linkItems));
echo '<ul>';
foreach($linkItems as $link) {
    echo '<li><a href="'.$link.'">'.$link.'</a></li>';
}
echo '</ul>';

The output looks like the following:

http://google.com
http

The $model->comments looks like the following:

destined for surplus
RT#83015
RT#83617
http://google.com
https://google.com
non-link

The list generated is only suppose to be links, and there should be no lines that are empty. Is there something wrong with what I did, because the Regex seems to be correct.

If I'm understanding right, you should use preg_match_all in your getLinksInComment function instead:

preg_match_all($pattern, $comment, $matches);

if (isset($matches[0])) {
    return $matches[0];
}
return array();    #in case there are no matches

preg_match_all gets all matches in a string (even if the string contains newlines) and puts them into the array you supply as the third argument. However, anything matched by your regex's capture groups (e.g. (http|https|ftp|ftps)) will also be put into your $matches array (as $matches[1] and so on). That's why you want to return just $matches[0] as your final array of matches.

I just ran this exact code:

$line = "destined for surplus

RT#83015

RT#83617

http://google.com

https://google.com

non-link";

$pattern = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
preg_match_all($pattern, $line, $matches);

var_dump($matches);

and got this for my output:

array(3) {
  [0]=>
  array(2) {
    [0]=>
    string(17) "http://google.com"
    [1]=>
    string(18) "https://google.com"
  }
  [1]=>
  array(2) {
    [0]=>
    string(4) "http"
    [1]=>
    string(5) "https"
  }
  [2]=>
  array(2) {
    [0]=>
    string(0) ""
    [1]=>
    string(0) ""
  }
}

Your comment is structured as multiple lines, some of which contain the URLs in which you're interested and nothing else. This being the case, you need not use anything remotely resembling that disaster of a regex to try to pick URLs out of the full comment text; you can instead split by newline, and examine each line individually to see whether it contains a URL. You might therefore implement a much more reliable getLinksInComment() thus:

function getLinksInComment($comment) {
    $links = array();
    foreach (preg_split('/?
/', $comment) as $line) {
        if (!preg_match('/^http/', $line)) { continue; };
        array_push($links, $line);
    };
    return $links;
};

With suitable adjustment to serve as an object method instead of a bare function, this should solve your problem entirely and free you to go about your day.