I have retrieved html page using cURL, now I want to extract the specific meta content from the meta data. i.e. <meta name="ids" content="123nsdfsdfAS">
.
What I did as follows:
function file_get_contents_curl($url)
{
$agent= 'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0';
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$html = file_get_contents_curl("http://example.com");
So, I want to extract a specific meta content i.e. <meta name="ids" content="123nsdfsdfAS">
from $html
using preg_match_all
or preg_match
or related any function and regular expression. I have written a regex but that is not good, so I did not mention here.
Well, here it's fairly easy:
/<meta[^>]+>/
will match any meta tag.
/<meta name="ids"[^>]+>/
will match only the meta tag with the name ids
.
If you only want the content in this
/<meta name="ids" content="([^"]+)">/
try this <meta name="ids"(.*?)>
. easy way... $1 will give you the attributes
Below regex match meta element
<meta(?: [^>]+)?>
ex:
<meta>
<meta id="12"> any attribute
<meta(?: [^>]+)? id="([^"]*)"[^>]*>
ex:
<meta id="123">
<meta id="123" content="cnt">