I want to prevent spaces in hyperlinks on a UGC site. I have written regular expression it works perfect except its not removing trailing space from link and anchor text.
Here is my code:
$text = '< a href = " http://www.examplesite.com/ "> Example site </a>';
$text = preg_replace('#(<(\s+)*a(\s+)*href(\s+)*=(\s+)*("|\')(\s+)*([^"]+)("|\')>(\s+)*([^<]+)(\s+)*</a>)#','<a href="$8">$11</a> ',$text);
<a href="http://www.examplesite.com/ ">Example site </a>
URLs also contain spaces i.e. http://www.examplesite.com/blog/a page with space.html
Try this:
preg_replace("{<\s*a\s*href\s*=\s*(\"|')\s*([^\s]+)\s*\\1>\s*(.*?)\s+</a>}","<a href='$2'>$3</a>",$text);
Try this to remove extra spaces
function RemoveExtraSpaces($str)
{
while(strpos($str," "))
{
$str = str_replace(" ", "", $str);
}
return $str;
}
I am no expert in regex but it seems you need a way to backtrack, you read all the way up to the closing "
but you have to backtrack to the last non space character. I got no clue how to do that so what I would do is after you got your semi-cleared string
a) either str_replace or b) write a second regex
$str = str_replace(" '>","'>", $count);
$str = str_replace(" \">","\">", $count);
$str = str_replace(" </a>","</a>", $count);
recursively until no more replacements can be done should work. Its primitive I know, but should do the job.