从HTML中删除所有http和https,但不包括占位符

I want to remove all http: and https: in the HTML files but exclude placeholder="http: and placeholder="https:. I have tried the following example but every http: and https: will be removed:

/(?!placeholder=")(http:|https:)/

You need to replace the lookahead with a lookbehind. Besides, you may reduce the alternation to a mere https?: pattern, where s? means 1 or 0 s:

'/(?<!placeholder=")https?:/'
    ^                   ^^

If you want to make sure the placeholder is matched as a whole word, add a word boundary:

'/(?<!\bplaceholder=")https?:/'
      ^^

If there must be a whitespace before placeholder, replace \b with \s.

Details

  • (?<!\bplaceholder=") - a location inside a string that is immediately preceded with a whole word placeholder and then ="
  • http - a http substring
  • s? - an optional s
  • : - a colon.