元描述的正则表达式代码是什么

I am looking for a regex code for a meta description tag.

#<meta|name="description|".*content|="([^"]+)"># <- 

That is what I have, but it doesn't match capital letters, as I discovered some tags are like META DESCRIPTION =, etc.

Is there a new code or a way to change this one to match capital letter content?

Add flag i after your last #.

Like this:

#<meta|name="description|".*content|="([^"]+)">#i

That will tell your regular expression to be case insensitive. Read more about flags here.

Your regex <meta|name="description|".*content|="([^"]+)"> is broken, it means:

  • <meta
    OR
  • name="description
    OR
  • " followed by anything followed by content
    OR
  • =" followed by at least one character that is not " followed by ">

Warning!

Let me say that parsing HTML with regular expressions is a very bad idea.

Regex alternative for training purposes

But if you want to try something out for training, start improving this:

#<meta name="description" content="([^"]+)">#i

which is case-insensitive and does what you think it does.

False negatives

Beware that it won't match valid elements like this:

<meta name="description"      content="foo bar baz">

or

<meta
   name="description"
   content="foo bar baz">

or

<meta content="foo bar baz" name="description">

You can use it like this:

/<meta[^>]*name=[\"|\']description[\"|\'][^>]*content=[\"]([^\"]*)[\"][^>]*>/i

works for compressed html code too.

Check this php function to get all meta details including description easily.