I use the following regexp in a php function to replace URLs with proper HTML links:
return preg_replace('@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@', '<a href="$1" target="_blank">$1</a>', $s);
But when $s has for value a string like
<li>http://www.link.com/something.pdf</li>
the function returns
<li><a href="http://www.link.com/something.pdf</li">http://www.link.com/something.pdf</li></a></li>
Does anyone know how to modify the regexp to get the intended string, i.e.
<li><a href="http://www.link.com/something.pdf">http://www.link.com/something.pdf</a></li> ?
without excluding from the replacement substrings of the URL introduced by '%', '?' or '&' ?
Really easy solution:
return '<li>'.preg_replace('@(https?://([-\w.]+[-\w])+(:\d+)?(/([\w-.~:/?#\[\]\@!$&\'()*+,;=%]*)?)?)@', '<a href="$1" target="_blank">$1</a>', $s).'</li>';
If you really want a regex:
return preg_replace('@(https?://([-\w.]+[-\w])+(:\d+)?(/([\w-.~:/?#\[\]\@!$&\'()*+,;=%]*)?)?)@', '<a href="$1" target="_blank">$1</a>', $s);
You rpattern is not sufficient (to catch all the links), but anyway, instead of \S+
you might want to have [^\s<>]+
because the former catches everything non-space.
Same applies to [^\.\s]
. Make this [^\s<>.]
. You don't need to escape the dot when used in a character class, so my addition to this group was basically the greater than and less than signs.