使用我的网络抓取工具扫描元标记(php)

i have a php web crawler and i would love to add the get_meta_tags() function to it. it scans the given web page for all urls and so on. is it possible to add the get_meta_tag method to the web crawler so it gets the metas from the scanned urls?

 session_start();

 $domain = "www.ebay.com";

 if(empty($_SESSION['page']))
 {
 $original_file = file_get_contents("http://" . $domain . "/");

 $_SESSION['i'] = 0;

 $connect = mysql_connect("cust-mysql-123-05", "uthe_774575_0001", "rooney08");

 if (!$connect)
 {
 die("MySQL could not connect!");
 }

 $DB = mysql_select_db('theqlickcom_774575_db1');

if(!$DB)
{
 die("MySQL could not select Database!");
}
}
if(isset($_SESSION['page']))
{

$connect = mysql_connect("xxxxx", "xxxxx", "xxxx");

if (!$connect)
{
die("MySQL could not connect!");
}

$DB = mysql_select_db('xxxx');

if(!$DB)
{
die("MySQL could not select Database!");
}
$PAGE = $_SESSION['page'];
$original_file = file_get_contents("$PAGE");
}

$stripped_file = strip_tags($original_file, "<a>");
preg_match_all("/<a(?:[^>]*)href=\"([^\"]*)\"(?:[^>]*)>(?:[^<]*)<\/a>/is", $stripped_file,     $matches);

foreach($matches[1] as $key => $value)
{

if(strpos($value,"http://") != 'FALSE' && strpos($value,"https://") != 'FALSE')
{
$New_URL = "http://" . $domain . $value; 
}
else
{
$New_URL = $value;
}
$New_URL = addslashes($New_URL);
$Check = mysql_query("SELECT * FROM pages WHERE url='$New_URL'");
$Num = mysql_num_rows($Check);

if($Num == 0)
{
mysql_query("INSERT INTO pages (url)
VALUES ('$New_URL')");

 $_SESSION['i']++;

 echo $_SESSION['i'] . "";
  } 
  echo mysql_error();
   }

  $RandQuery = mysql_query("SELECT DISTINCT * FROM pages ORDER BY rank LIMIT 0,1");
  $RandReturn = mysql_num_rows($RandQuery);
  while($row1 = mysql_fetch_assoc($RandQuery))
  {
  $_SESSION['page'] = $row1['url'];
  } 
  echo $RandReturn;
  echo $_SESSION['page'];
  mysql_close();

  ?>

First of all, why do you put quotes in this line?:

$original_file = file_get_contents("$PAGE");

Second, all metatags can be retrieved by

$tags = get_meta_tags('http://www.example.com/');

see php.net

So in your example, I guess you will have to use:

$tags = get_meta_tags($New_URL);

And save that array in your database.

I've been into this problem before when reading html tags from an external source. Jstel got a good solution for me, although I believe you can incorporate her solution to yours.

http://www.php.net/manual/en/function.get-meta-tags.php#92197

Base on your code here's how it works:

$domain = "www.ebay.com";
$original_file = file_get_contents("http://" . $domain . "/");
preg_match_all("/<meta[^>]+(http\-equiv|name)=\"([^\"]*)\"[^>]" . "+content=\"([^\"]*)\"[^>]*>/i",$original_file, $result);
print_r($result);

You're going to see sample result below I got from this regex:

Array
(
    [0] => Array
        (
            [0] => <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
            [1] => <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
            [2] => <meta name="keywords" content="ebay, electronics, cars, clothing, apparel, collectibles, sporting goods, digital cameras, antiques, tickets, jewelry, online shopping, auction, online auction">
            [3] => <meta name="description" content="Buy and sell electronics, cars, fashion apparel, collectibles, sporting goods, digital cameras, baby items, coupons, and everything else on eBay, the world's online marketplace">
            [4] => <meta name="verify-v1" content="j6ZKbG61n+f9pUtbkf69zFRBrRSeUqyfEJ2BjiRxWDQ=">
            [5] => <meta name="y_key" content="acf32e2a69cbc2b0">
            [6] => <meta name="msvalidate.01" content="31154A785F516EC9842FC3BA2A70FB1A">
        )

    [1] => Array
        (
            [0] => http-equiv
            [1] => http-equiv
            [2] => name
            [3] => name
            [4] => name
            [5] => name
            [6] => name
        )

    [2] => Array
        (
            [0] => Content-Type
            [1] => Content-Type
            [2] => keywords
            [3] => description
            [4] => verify-v1
            [5] => y_key
            [6] => msvalidate.01
        )

    [3] => Array
        (
            [0] => text/html; charset=UTF-8
            [1] => text/html; charset=UTF-8
            [2] => ebay, electronics, cars, clothing, apparel, collectibles, sporting goods, digital cameras, antiques, tickets, jewelry, online shopping, auction, online auction
            [3] => Buy and sell electronics, cars, fashion apparel, collectibles, sporting goods, digital cameras, baby items, coupons, and everything else on eBay, the world's online marketplace
            [4] => j6ZKbG61n+f9pUtbkf69zFRBrRSeUqyfEJ2BjiRxWDQ=
            [5] => acf32e2a69cbc2b0
            [6] => 31154A785F516EC9842FC3BA2A70FB1A
        )

)