PHP修剪www如果存在并删除路径

I have an array of domain names, like:

array(
'http://example.co.uk/foo/bar',
'http://www.example.com/foo/bar',
'http://example.net/foo/bar')

And so on.

I am using

parse_url($url, PHP_URL_HOST);

to trim everything and just keep the domain name and it's working partly however, it's keeping the www part if it exists. How can I remove the 'www' if it exists. I tried to explicitly remove it from the domain name in the array, but when it resolves it reverts back to www.example.com.

So I'd like to return:

www.example.com/foo/bar > example
www.example.co.uk/foo/bar > example
example.com/foo/bar > example
example.net/foo/bar > example

The function below is not a general purpose function to get the domain name or domain part of a FQDN. Rather it will return the first label (from left to right) if it is not www, and the second label if it is. As requested above.

<?php

function get_domain_from_host($host)
{
    $parts = explode('.', $host);
    $domain = strpos($host, 'www') === 0
        ? next($parts)
        : current($parts);

    return $domain;
}

function test()
{
    $urls_wanted = array(
        'http://www.example.com/foo/bar' => 'example',
        'http://www.example.co.uk/foo/bar' => 'example',
        'http://example.com/foo/bar' => 'example',
        'http://example.net/foo/bar' => 'example'
    );

    foreach($urls_wanted as $url => $wanted) {
        $host   = parse_url($url, PHP_URL_HOST);
        $domain = get_domain_from_host($host);
        print assert($wanted == $domain);
    }
}

test(); // Outputs: 1111

Example use (copy the function above):

$url    = 'http://www.example.com/foo/bar';
$host   = parse_url($url, PHP_URL_HOST);
$domain = get_domain_from_host($host);
echo $domain; // output is 'example'.

If you only want to strip the 'www' than you could use str_replace for strpos for checking if 'www' in your string.

$url = "";
if (strpos($url,'www') !== false) {
    $url = str_replace("www", "", $url);
}

Edit: to strip almost all of your url (including domain extension and www (if exists)) you could do:

$result = preg_split('/(?=\.[^.]+$)/', "example.com/foo/bar")[0];
if (strpos($result,'www') !== false) {
    $result = str_replace("www.", "", $result);
}
var_dump($result);

You can match using the regex ~(?:https?://)?(?:www\.)?([^\./]+)~i.

Limitations:

Please note that it will parse the valid domain www.com incorrectly and return com rather than www. It will only parse them incorrectly if the name part is www (www.net, www.co.uk etc.).

Autopsy:

  • ~ we specify our modifier character - the regex will know that the next time it sees this character all we specify is the modifiers
  • (?:https?://)? an optional non-capturing group:
    • ?: means that it's a non-capturing group (without it we'd have to use return $match[3])
    • http the literal string http
    • s? the character s matched 0 to 1 time (it's optional)
    • :// the literal string ://
    • (..)? the entire group matched 0 to 1 time (optional)
  • (?:www\.)? an optional non-capturing group:
    • ?: means that it's a non-capturing group (without it we'd have to use return $match[3])
    • www\. the literal string www. - we need to escape the dot with a slash, as the dot has special meaning in regex (any character)
    • (..)? the entire group matched 0 to 1 time (optional)
  • ([^\./]+) a capturing group:
    • [^\./]+ any character that ISN'T . or / matched 1 to infinity times
  • ~i - *our ending modifier character - i means that the entire regex is incasesensitive (so we match HTTPS and WwW)

Debuggex:

Regular expression visualization

Function:

<?php

function getSiteName($url) {
    if (preg_match('~(?:https?://)?(?:www\.)?([^\./]+)~i', $url, $match)) {
        return $match[1];
    }

    throw new \Exception(sprintf('Could not match URL "%s"', $url));
}

Usage:

$siteName = getSiteName('http://www.example.com/foo/bar');

DEMO:

var_dump( getSiteName( 'http://www.example.com/foo/bar' ) );
// string(7) "example"

var_dump( getSiteName( 'https://example.co.uk/foo/bar' ) ); 
// string(7) "example"

var_dump( getSiteName( 'www.example.com/foo/bar' ) );  
// string(7) "example"

var_dump( getSiteName( 'www.example.co.uk/foo/bar' ) );  
// string(7) "example"

var_dump( getSiteName( 'example.com/foo/bar' ) );  
// string(7) "example"

var_dump( getSiteName( 'example.net/foo/bar' ) );  
// string(7) "example"

var_dump( getSiteName( 'www.com/foo/bar' ) );  
// string(3) "com" (fails)