I'm trying to get a dynamically loaded content from a web page. Specifically the options loaded to a select. So if I do:
$options = $html->find('select[class=theSelectClass]')[0]->find('option');
foreach($options as $option){
echo $option->text().'<br>';
}
This works as expected and my output is:
Select an option
Why? Because the other options are loaded with JS after the page loads. So my question is how can I get this dynamically loaded options inside the select?
This is my attempt using JS Ajax and another PHP page:
in my php that includes the simple_html_dom:
$html->load_file($base);
$var = '<script>
var xhttp = new XMLHttpRequest();
xhttp.onreadystatechange = function() {
if (this.readyState == 4 && this.status == 200) {
this.responseText;
}
};
xhttp.open("GET", "http://localhost/crawler/ajax.php?param=HelloWorld", true);
xhttp.send();
</script>';
$e = $html->find("body", 0);
$e->outertext = $e->makeup() . $e->innertext . $var . '</body>';
and my ajax.php file:
file_put_contents ( 'ajax.txt' , $_GET['param']);
I was trying to see if I could send an Ajax call from the html loaded file, but I feel far from being able to do it. So how can I make this happen?
Thank you
It might be easier for you to first use a headless browser to render the page then pass that to simple html dom. You could do this with CasperJS/PhantomJS or another tool that renders the page with javascript.
`
require("vendor/autoload.php");
use Sunra\PhpSimple\HtmlDomParser;
use Browser\Casper;
$casper = new Casper();
// forward options to phantomJS
// for example to ignore ssl errors
$casper->setOptions(array(
'ignore-ssl-errors' => 'yes'
));
$casper->start('https://www.reddit.com');
$casper->wait(5000);
$output = $casper->getOutput();
$casper->run();
$html = $casper->getHtml();
$dom = HtmlDomParser::str_get_html( $html );
$elems = $dom->find("a");
foreach($elems as $e){
print_r($e->href);
}
?>`