I have some code in python that scrapes a page and finds all elements with the class name 'group-head' and clicks them to generate a page with all ajax calls performed. This works in python but I wanted to know if it can be done on curl/php coding?
# Get scraping...
tree = parseLxml(driver=driver, url=url) # Go to URL and parse
elem = driver.find_elements_by_class_name('group-head') # Use ChromeDriver to find element to get to the Ajax call
for x in range(0,len(elem)): # Loop through all such elements
try:
time.sleep(0.5)
elem[x].click() # Click the element
time.sleep(1.5) # Too fast and errors can occur, so wait...
except:
pass
newpage = driver.page_source # Need to get page source again now all visible
newtree = html.fromstring(newpage)
match = newtree.xpath('//td[contains(@class,"score-time")]/a/@href') # Scrape match link
base = 'http://uk.soccerway.com'
for m in match:
mURL = base+str(m)
print ('Match URL:',mURL)
Your code is using the ChromeDriver so you should look for a PHP binding.
Have a look at https://github.com/facebook/php-webdriver, you should be able to use it the same way. Code not tested but should look like:
$host = 'http://localhost:4444/wd/hub'; // Selenium Host
$driver = ChromeDriver::create($host);
$driver->get($url); // Got to Url and Load Page
$elements = $driver->findElements(WebDriverBy::className('group-head'));
....
Yes thats possible with PHP :)
But you have to follow these steps..
1) Download the Dom Parser from here for PHP
2) while clicking on the link in a page you can call using ajax which get the content of the file (file_get_html)
.
3) And finally get the required data using its id, element, classname.
$html = file_get_html('http://www.google.com/');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';