I have a repetitive task that I do daily. Log in to a web portal, click a link that pops open a new window, and then click a button to download an Excel spreadsheet. It's a time consuming task that I would like to automate.
I've been doing some research with PHP and cUrl, and while it seems like it should be possible, I haven't found any good examples. Has anyone ever done something like this, or do you know of any tools that are better suited for it?
Are you familiar with the basics of HTTP requests? Like, do you know the difference between a POST and a GET request? If what you're doing amounts to nothing more than GET requests, then it's actually super simple and you don't need to use cURL at all. But if "clicking a button" means submitting a POST form, then you will need cURL.
One way to check this is by using a tool such as Live HTTP Headers and watching what requests happen when you click on your links/buttons. It's up to you to figure out which variables need to get passed along with each request and which URLs you need to use.
But assuming that there is at least one POST request, here's a basic script that will post data and get back whatever HTML is returned.
<?php
if ( $ch = curl_init() ) {
$data = 'field1=' . urlencode('somevalue');
$data .= '&field2[]=' . urlencode('someothervalue');
$url = 'http://www.website.com/path/to/post.asp';
$userAgent = 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)';
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
$html = curl_exec($ch);
curl_close($ch);
} else {
$html = false;
}
// write code here to look through $html for
// the link to download your excel file
?>
try this >>>
$ch = curl_init();
$csrf_token = $this->getCSRFToken($ch);// this function to get csrf token from website if you need it
$ch = $this->signIn($ch, $csrf_token);//signin function you must do it and return channel
curl_setopt($ch, CURLOPT_HTTPGET, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 300);// if file large
curl_setopt($ch, CURLOPT_URL, "https://your-URL/anything");
$return=curl_exec($ch);
// the important part
$destination ="files.xlsx";
if (file_exists( $destination)) {
unlink( $destination);
}
$file=fopen($destination,"w+");
fputs($file,$return);
if(fclose($file))
{
echo "downloaded";
}
curl_close($ch);