正则表达式模式在行的开头不匹配

I have this pattern:

/([^>'"])(http|ftp)+(s)?:(\/\/)((\w|\.)+)(\/)?(\S+)?/

when using this as a subject:

http://www.google.com <a href="http://www.google.com">http://www.google.com</a> http://www.google.com

It matches the last http://www.google.com but not the first one at the start of the line. How can I get it to match the first one at the start of the line too? (and continue to not match inside the anchor tag)

It's because [^'">] means any one character that isn' ', " or >. There is no one character before the http at the start of the line, which is why it's not matching.

One possibility (not necessarily the best), is to use something like:

(([^'">])(http))|(^http)

(either of two possible patterns). This basically means to give me all those you currently specify as well as "http" at the start of the line.

I don't doubt there are trickier ways to do this with the more advanced regex features like look-ahead, negative look-behind or the little known surreptitious look-under (a), but I prefer simplicity most of the time.


(a) Some features alluded to in this answer may not, in fact, exist :-)

/(^|[^>'"])(http|ftp)+(s)?:(\/\/)((\w|\.)+)(\/)?(\S+)?/ will do it for you. ^ inside [] will negate the rest of the characters. You have to keep ^ at the starting of the regex and outside of [] to match the start of the line

try ([^'">])?(http) (untested)