I have an array of domain names, like:
array(
'http://example.co.uk/foo/bar',
'http://www.example.com/foo/bar',
'http://example.net/foo/bar')
And so on.
I am using
parse_url($url, PHP_URL_HOST);
to trim everything and just keep the domain name and it's working partly however, it's keeping the www part if it exists. How can I remove the 'www' if it exists. I tried to explicitly remove it from the domain name in the array, but when it resolves it reverts back to www.example.com.
So I'd like to return:
www.example.com/foo/bar > example
www.example.co.uk/foo/bar > example
example.com/foo/bar > example
example.net/foo/bar > example
The function below is not a general purpose function to get the domain name or domain part of a FQDN. Rather it will return the first label (from left to right) if it is not www, and the second label if it is. As requested above.
<?php
function get_domain_from_host($host)
{
$parts = explode('.', $host);
$domain = strpos($host, 'www') === 0
? next($parts)
: current($parts);
return $domain;
}
function test()
{
$urls_wanted = array(
'http://www.example.com/foo/bar' => 'example',
'http://www.example.co.uk/foo/bar' => 'example',
'http://example.com/foo/bar' => 'example',
'http://example.net/foo/bar' => 'example'
);
foreach($urls_wanted as $url => $wanted) {
$host = parse_url($url, PHP_URL_HOST);
$domain = get_domain_from_host($host);
print assert($wanted == $domain);
}
}
test(); // Outputs: 1111
Example use (copy the function above):
$url = 'http://www.example.com/foo/bar';
$host = parse_url($url, PHP_URL_HOST);
$domain = get_domain_from_host($host);
echo $domain; // output is 'example'.
If you only want to strip the 'www' than you could use str_replace for strpos for checking if 'www' in your string.
$url = "";
if (strpos($url,'www') !== false) {
$url = str_replace("www", "", $url);
}
Edit: to strip almost all of your url (including domain extension and www (if exists)) you could do:
$result = preg_split('/(?=\.[^.]+$)/', "example.com/foo/bar")[0];
if (strpos($result,'www') !== false) {
$result = str_replace("www.", "", $result);
}
var_dump($result);
You can match using the regex ~(?:https?://)?(?:www\.)?([^\./]+)~i
.
Limitations:
Please note that it will parse the valid domain www.com
incorrectly and return com
rather than www
. It will only parse them incorrectly if the name part is www
(www.net
, www.co.uk
etc.).
Autopsy:
~
we specify our modifier character - the regex will know that the next time it sees this character all we specify is the modifiers(?:https?://)?
an optional non-capturing group:?:
means that it's a non-capturing group (without it we'd have to use return $match[3]
)http
the literal string http
s?
the character s
matched 0 to 1 time (it's optional)://
the literal string ://
(..)?
the entire group matched 0 to 1 time (optional)(?:www\.)?
an optional non-capturing group:?:
means that it's a non-capturing group (without it we'd have to use return $match[3]
)www\.
the literal string www.
- we need to escape the dot with a slash, as the dot has special meaning in regex (any character)(..)?
the entire group matched 0 to 1 time (optional)([^\./]+)
a capturing group:[^\./]+
any character that ISN'T .
or /
matched 1 to infinity times~i
- *our ending modifier character - i
means that the entire regex is incasesensitive (so we match HTTPS
and WwW
)Function:
<?php
function getSiteName($url) {
if (preg_match('~(?:https?://)?(?:www\.)?([^\./]+)~i', $url, $match)) {
return $match[1];
}
throw new \Exception(sprintf('Could not match URL "%s"', $url));
}
Usage:
$siteName = getSiteName('http://www.example.com/foo/bar');
DEMO:
var_dump( getSiteName( 'http://www.example.com/foo/bar' ) );
// string(7) "example"
var_dump( getSiteName( 'https://example.co.uk/foo/bar' ) );
// string(7) "example"
var_dump( getSiteName( 'www.example.com/foo/bar' ) );
// string(7) "example"
var_dump( getSiteName( 'www.example.co.uk/foo/bar' ) );
// string(7) "example"
var_dump( getSiteName( 'example.com/foo/bar' ) );
// string(7) "example"
var_dump( getSiteName( 'example.net/foo/bar' ) );
// string(7) "example"
var_dump( getSiteName( 'www.com/foo/bar' ) );
// string(3) "com" (fails)