代码是这样的:
[code="php"]<?php
$url = "http://news.sise.com.cn/show.php?id-1611.html";
//$html = file_get_contents($url);
$html=iconv("gb2312", "utf-8",file_get_contents($url));
//echo $html;
$doc=new DomDocument('1.0','utf-8');
$doc->loadHTML($html);
$xpath=new DOMXpath($doc);
$flag = "//p[@class='MsoNormal']";
foreach($xpath->query($flag) as $node){
$link = $node->nodeValue;
echo $link . "\n";
}
?>[/code]
帮忙看看怎样修改才能不乱码?
访问这个页面:http://news.sise.com.cn/show.php?id-1611.html
看看页面编码是什么。
再决定
iconv("gb2312", "utf-8",file_get_contents($url)); 还是
iconv("utf-8", "gb2312",file_get_contents($url));
$doc=new DomDocument('1.0','utf-8');
还是
$doc=new DomDocument('1.0','gb2312');
实在不知道了,就来回换着试几次就知道了,反正也就几种情况
用这个工具试试:
php simple dom
抓网页时特别有用。google一下。