I am trying to get score table from this page http://www.skysports.com/football/competitions/bundesliga/table. I do this with
$bundes = file('http://www.skysports.com/football/competitions/bundesliga/table');
And when i try to display array $bundes i do it with this:
echo '<pre>', print_r($bundes), '</pre>';
The code witch i try do display is displayed like this:
[1437] =>
[1022] => German Bundesliga 2015/16
# Team Pl W D L F A GD Pts Last 6
1 [1059] => [1060] => Bayern Munich [1061] => [1062] => 9 9 0 0 29 4 25 27 [1072] =>
[1073] =>
[1074] =>
This is the first row of table. And now i can display $bundes[1060] and i get output of Bayer Munich but how can i get values from $bundes[1062], values are 9, 9, 0, 0, 29, 4, 25 and 27? I need to display each of this values in <td></td>
When i try to echo $bundes[1062] i get nothing.
A more reliable way of extracting the data is using DOM manipulation classes to do something like:
$doc = new \DOMDocument();
@$doc->loadHTMLFile('http://www.skysports.com/football/competitions/bundesliga/table');
$xpath = new \DOMXPath($doc);
$rows = $xpath->query('//tbody/tr');
$data = [];
foreach ($rows as $i => $row) {
$columns = $xpath->query('td', $row);
foreach ($columns as $column) {
$data[$i][] = trim($column->textContent);
}
}
print_r($data);
Which gives you:
Array
(
[0] => Array
(
[0] => 1
[1] => Bayern Munich
[2] => 9
[3] => 9
[4] => 0
[5] => 0
[6] => 29
[7] => 4
[8] => 25
[9] => 27
[10] =>
)
...
Regarding Dagon's comment, no terms can disallow crawling and extracting the data (as long as you do so at a reasonable rate that does not impact the website's performance). Terms of use & copyright law, however, do dictate what you can and cannot do with the crawled content (ex. republish).
Web scraping may be against the terms of use of some websites. The enforceability of these terms is unclear (see "FAQ about linking – Are website terms of use binding contracts?").
- Wikipedia, Web scraping: Legal issues
BTW, the pages robots meta tag does allow INDEX.