使用PHP从网站上抓取数据[关闭]

I am trying to gather information into a text file, that I will later be uploading to a MySQL database. I am trying to gather all of the PS3 trophy information. I will be using this website : http://www.ps3trophies.org/games/psn/1/ to gather the information. What I need to do is go inside each game on every single page, get the game name, and each of the trophies and all of the information about them. Thanks for any info you can give me.

I recommend using the Simple HTML DOM Parser to do this. You can use jQuery/CSS selectors to navigate elements on the page. You could do something like this:

$html = file_get_html('http://www.ps3trophies.org/games/psn/1/');
$otherPages = $html->find('a[href^=/games/psn/]'); // this will get the links for the 7 other pages

And then you can also build a selector for all the games pages, and load them. Read through the parser documentation for all the stuff you can do.

In short, you need to use the PHP function get_file_contents()

like so:

for ($i = 0; i<number_of_pages; i++){
    $url = 'http://www.ps3trophies.org/games/psn/' . i;
    $html = get_file_contents($url);

    //do a regex search on $html to pinpoint your data

    //save it
}

now you can use the $html variable, combined with a regular expression, to find the data you need.

Check this out will give you the expected output

<?php
error_reporting(E_ERROR | E_PARSE);
$dom = new DOMDocument();
$dom->loadHTMLFile('http://www.ps3trophies.org/games/psn/1/');
$xml = simplexml_import_dom($dom);
$links = $xml->xpath('//table/tr/td/a');
for($i=30;$i<count($links);$i++): 
?>
<a target="_blank" href="http://www.ps3trophies.org<?php echo $links[$i]['href']; ?>"><?php echo $links[$i]['href']; ?></a><br/>
<?php
endfor;
?>