I am looking for a regex code for a meta description tag.
#<meta|name="description|".*content|="([^"]+)"># <-
That is what I have, but it doesn't match capital letters, as I discovered some tags are like META DESCRIPTION =
, etc.
Is there a new code or a way to change this one to match capital letter content?
Add flag i
after your last #
.
Like this:
#<meta|name="description|".*content|="([^"]+)">#i
That will tell your regular expression to be case insensitive. Read more about flags here.
Your regex <meta|name="description|".*content|="([^"]+)">
is broken, it means:
<meta
name="description
"
followed by anything followed by content
="
followed by at least one character that is not "
followed by ">
Let me say that parsing HTML with regular expressions is a very bad idea.
But if you want to try something out for training, start improving this:
#<meta name="description" content="([^"]+)">#i
which is case-insensitive and does what you think it does.
Beware that it won't match valid elements like this:
<meta name="description" content="foo bar baz">
or
<meta
name="description"
content="foo bar baz">
or
<meta content="foo bar baz" name="description">
You can use it like this:
/<meta[^>]*name=[\"|\']description[\"|\'][^>]*content=[\"]([^\"]*)[\"][^>]*>/i
works for compressed html code too.
Check this php function to get all meta details including description easily.