I'm trying to scrape this web page ...
https://www.aslteramo.it/SISWebOnLine/ProntoSoccorso.aspx
.... using PHP and XPath to get the number values under the red, yellow, green and white colored circles.
(NOTE: you could see different value in that page if you try to browse it ... it doesn't matter ..,, it change dinamically .... )
I'm trying to use this PHP code sample to print the value ...
<?php
ini_set('display_errors', 'On');
error_reporting(E_ALL);
$url = 'http://www.aslteramo.it/SISWebOnLine/ProntoSoccorso.aspx';
$xpath_for_parsing = '/html/body/div/form/div[3]/div[2]/div[3]/div/div/div[2]/table/tbody/tr[2]/td[4]/table/tbody/tr[1]/td';
//#Set CURL parameters: pay attention to the PROXY config !!!!
$ch = curl_init();
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_PROXY, '');
$data = curl_exec($ch);
curl_close($ch);
$dom = new DOMDocument();
@$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
$colorWaitingNumber = $xpath->query($xpath_for_parsing);
$theValue = 'N.D.';
foreach( $colorWaitingNumber as $node )
{
$theValue = $node->nodeValue;
}
print $theValue;
?>
Note that, to get the elements XPath, you've to disable javascript in your browser because the mouse right click is disabled.
I've seen that in the page there is a POST request ...
.... but I don't know how to modify my code to do the request and then how to extract my values ...
Any help will be appreciated.
Thank you in advance
I've seen that in the page there is a POST request ...
You can't get the data is that POST request is fetching it on page load. You need to do the same POST reqeust:
$curl = curl_init();
curl_setopt_array($curl, array(
CURLOPT_URL => "https://www.aslteramo.it/SISWebOnLine/ProntoSoccorso.aspx",
CURLOPT_RETURNTRANSFER => true,
CURLOPT_ENCODING => "",
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 30,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => "POST",
// this is to emulate the page behavior
CURLOPT_POSTFIELDS => "ctl00%24ScriptManager1=ctl00%24MainContent%24UpdatePanel1%7Cctl00%24MainContent%24Timer1&__EVENTTARGET=ctl00%24MainContent%24Timer1&__EVENTARGUMENT=&__VIEWSTATE=%2FwEPDwUKLTYxOTg2MDY2NA9kFgJmD2QWAgIDD2QWBgIDDzwrAA0CAA8WAh4LXyFEYXRhQm91bmRnZAwUKwAGBRMwOjAsMDoxLDA6MiwwOjMsMDo0FCsAAhYQHgRUZXh0BQ1Ib21lIHBhZ2UgQVNMHgVWYWx1ZQUNSG9tZSBwYWdlIEFTTB4LTmF2aWdhdGVVcmwFF2h0dHA6Ly93d3cuYXNsdGVyYW1vLml0HgdUb29sVGlwBRxQYWdpbmEgaW5pemlhbGUgZGVsIHNpdG8gQVNMHgdFbmFibGVkZx4KU2VsZWN0YWJsZWceCERhdGFQYXRoBRdodHRwOi8vd3d3LmFzbHRlcmFtby5pdB4JRGF0YUJvdW5kZ2QUKwACFhIfBWcfBmcfCGcfBwUhL3Npc3dlYm9ubGluZS9wcm9udG9zb2Njb3Jzby5hc3B4HwEFD1Byb250byBTb2Njb3Jzbx8CBQ9Qcm9udG8gU29jY29yc28fBAUeVGVtcGkgZCdhdHRlc2EgUHJvbnRvIFNvY2NvcnNvHghTZWxlY3RlZGcfAwUhL1NJU1dlYk9uTGluZS9Qcm9udG9Tb2Njb3Jzby5hc3B4ZBQrAAIWEB8BBQ5UZW1waSBkJ2F0dGVzYR8CBQ5UZW1waSBkJ2F0dGVzYR8DBSAvU0lTV2ViT25MaW5lL1RlbXBpRGlhdHRlc2EuYXNweB8EBShUZW1waSBkJ2F0dGVzYSBwcmVzdGF6aW9uaSBhbWJ1bGF0b3JpYWxpHwVnHwZnHwcFIC9zaXN3ZWJvbmxpbmUvdGVtcGlkaWF0dGVzYS5hc3B4HwhnZBQrAAIWEB8BBRZMaXN0YSBkJ0F0dGVzYSBFeC1Qb3N0HwIFFkxpc3RhIGQnQXR0ZXNhIEV4LVBvc3QfAwUpamF2YXNjcmlwdDpvcGVuV2ViRm9ybSgnV2ViRXhQb3N0LmFzcHgnKTsfBAUnTW9uaXRvcmFnZ2lvIExpc3RhIGQnQXR0ZXNhIC0gKEV4LVBvc3QpHwVnHwZnHwcFKWphdmFzY3JpcHQ6b3BlbndlYmZvcm0oJ3dlYmV4cG9zdC5hc3B4Jyk7HwhnZBQrAAIWEB8BBR5BdHRpdml0w6AgbGliZXJvLXByb2Zlc3Npb25hbGUfAgUeQXR0aXZpdMOgIGxpYmVyby1wcm9mZXNzaW9uYWxlHwMFHy9TSVNXZWJPbkxpbmUvQXR0aXZpdGFBbHBpLmFzcHgfBAUeQXR0aXZpdMOgIGxpYmVyby1wcm9mZXNzaW9uYWxlHwVnHwZnHwcFHy9zaXN3ZWJvbmxpbmUvYXR0aXZpdGFhbHBpLmFzcHgfCGdkZAIJDw8WAh8BBQ9Qcm9udG8gU29jY29yc29kZAILD2QWAgIBD2QWAmYPZBYGAgEPFgIfBWdkAgsPPCsADQBkAg0PFgIfBWdkGAMFHl9fQ29udHJvbHNSZXF1aXJlUG9zdEJhY2tLZXlfXxYBBSBjdGwwMCRNYWluQ29udGVudCRJbWdCdG5BZ2dpb3JuYQUVY3RsMDAkTWFpbkNvbnRlbnQkd3d3D2dkBRBjdGwwMCRuYXZpZ2F0aW9uDw9kBQ9Qcm9udG8gU29jY29yc29kTUucCs6%2BZyLbulTAFPNo569%2B%2BDE%3D&__VIEWSTATEGENERATOR=1A2B14D6&__EVENTVALIDATION=%2FwEWAgK27duvDwKDm%2B%2FCCycw%2FWHLOR5AmzLF035J86RYL0wa&__ASYNCPOST=true",
CURLOPT_HTTPHEADER => array(
"cache-control: no-cache",
"content-type: application/x-www-form-urlencoded"
),
));
$response = curl_exec($curl);
And then your XPATH:
$dom = new DOMDocument();
@$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
Hope that helps.