如何使用preg_replace从PHP字符串中删除链接

I'm using a chat bot script, if a user name was test@test.com the bot will reply @ <a href= mailto:test@test.com>test@test.com</a> with a mailto link. I want the reply to be only test@test.com without the link, I tried preg_replace and str_replace but I don't really know the exact code to use, I've tried the following but didnt work !

$name = preg_replace('/<a href="([^<]*)">([^<]*)<\/a>/', '', $name);

The whole code I'm using for replacements is this:

$name = str_replace (chr(0xc2).chr(0xa0), "_", $name);
$name = str_replace ("'", "", $name);
$name = str_replace ("&quot;", '"', $name);
$name = str_replace ("&amp;", "&", $name);
$name = str_replace ("&lt;", "", $name);
$name = str_replace ("&gt;", "", $name);
$name = str_replace ("&", "_", $name);
$name = str_replace ("*", "_", $name);
$name = preg_replace('/[^ \p{L}\p{N} \@ \_ \- \.\#\$\&\!]/u', '', $name);
$name = preg_replace('/<a href="([^<]*)">([^<]*)<\/a>/', '', $name);

Why do you want to replace it? Just use preg_match() with a regex similar to this:

<a href=[^>]+>([^<]*)</a>

so overall your code would look like this

<?php
$regex = '#<a href=[^>]+>([^<]*)</a>#';
$email = '<a href= mailto:test@test.com>test@test.com</a>';

preg_match($regex, $email, $matches);
var_dump($matches[1]);
/*
output:
string(13) "test@test.com"
*/
?>

The answer above makes a lot of assumptions when doing the preg_replace so it's going to fail lots unfortunately :( Here's why...

  • It assumes every link has the 'href' attribute directly after the 'a' tag. What if there's a different attribute in front of it?
  • It assumes there are no other html tags inside the 'a' tag. If the link had the 'strong' tag inside it, the link would not be matched.
  • I'm pretty sure too that if there's more than one link in the list it's going to remove everything between the first link and the second because it hasn't got anything to stop it being greedy.
  • Finally, it's not been told to be insensitive. This means that if the link had A HREF in it, that wouldn't be found either.

I'm not saying my solution is 100% secure but I've tested it in scenarios I'm aware of and I think it's an upgrade from the answer above!...

$email = preg_replace("/<a.+?href.+?>.+?<\/a>/is","",$email);

The 'i' modifier makes it insensitive The 's' modifier takes into account links that might be broken with newline breaks.

I'd always recommend populating a string with different links in different formats, different orders etc. That's always the best way to test things work. Assuming eveyone types links as My test is going to get you into lots of sticky situations :)

Good luck!