RegEx删除http:// www。 如果它存在于PHP和JS中

Could someone please help me with a regular expression (I need it in php and in js) to remove http:// and www. from the beginning of a url string and remove the trailing / if its there.

For Example

  • http://www.google.com/ would be google.com
  • https://yahoo.com?page=1 would be yahoo.com?page=1
  • fancysite.com/articles/2012/ would be fancysite.com/articles/2012

Heres the code Im using for the JS side:

row.page_href.replace(/^(https?|ftp):\/\//, '')

And heres the code Im using for the php side:

$urlString = rtrim($urlString, '/');
$urlString = preg_replace('~^(?:https?://)?(?:www[.])?~i', '', $urlString);

As you can see the JS regex only removes http:// currently and the php requires two steps to do everything.

#(https?(://))?(www.?)?(.*)#i

Worked just fine for me. You could change the last (.*) to match the RFC standards of a URL.

Outputs:

david@david-desktop ~ $ php -a
Interactive shell

php > $str = preg_replace('#(https?(://))?(www.?)?(.*)#i', '$4', 'https://www.google.ca');
php > echo $str . PHP_EOL;
google.ca
php > $str = preg_replace('#(https?(://))?(www.?)?(.*)#i', '$4', 'https://google.ca');
php > echo $str . PHP_EOL;
google.ca
php > $str = preg_replace('#(https?(://))?(www.?)?(.*)#i', '$4', 'http://google.ca');
php > echo $str . PHP_EOL;
google.ca
php > 
function cleanUrl($url)
{
  if (($d= parse_url($url)) !== false) // valid url
  {
    return sprintf('%s%s%s',
      ltrim($d['host'], 'www.'),
      rtrim($d['path']. '/'),
      !empty($d['query']) ? '?'.$d['query'] : '');
  }
  return $url;
}

I would take advantage of parse_url (validate the url along with 'clean' it)