A client of mine has asked for me to create a simple site that monitors files on another site. He needs to monitor the file names (unsure why?) and have them outputted to a file.
Here's the example source; http://pastebin.com/tyLUmCJr
I don't speak Russian, so I'm unaware of what the site's about. I apologize if it's anything that's 'less-than-suitable'.
Anyway, if you scroll to line 117, you will see a file name. I need to get all of the file names.
I've played around with the DOMDocument and third-party tools although I believe I could use regex to increase the speed of this. If anybody could point me in the correct direction, it would be greatly appreciated.
Note: take in mind that the source is stored within a string-variable known as $content.
Cheers!
After some more detailed, extensive research, I found a way to do it. Here's how I achieved it;
<?php
require_once("phpQuery.php");
$min = isset($_GET['min']) ? $_GET['min'] : 1;
$max = isset($_GET['max']) ? $_GET['max'] : 2;
$pages = [];
foreach(range($min, $max) as $page) {
array_push($pages, iconv("CP1251", "UTF-8", file_get_contents("http://www.fayloobmennik.net/files/list/" . $page . ".html")));
}
$html = file_get_html("http://www.fayloobmennik.net/files/list/");
$elem = $html->find('div[id=info] table > tbody', 0);
$test = $elem->find('tr a');
foreach ($test as $test2) {
$regex = '/<a href=\"([^\"]*)\">(.*)<\/a>/iU';
$test2 = preg_match($regex, $test2, $match);
print_r(iconv("CP1251", "UTF-8", $match[2]));
echo "<br/>";
}
?>
The phpQuery.php class is simple_html_dom (I believe that's what it's called?).
Cheers.