What is the most effective way of programmatically filling out an HTML form on a website, using data from a dataset (either CSV, JSON, or similar..) and then retrieving the results of that submitted form into another dataset? I would like to be able to do this multiple times, populating the form with different parameters each time, always retrieving those parameters from my input dataset.
I was reading about Selenium and HTMLUnit, which seem to do similar things. But they require installing dependencies and learning how to use them. Would it be overkill? Is there an easier way to do this by maybe writing my own script?
I tried writing a php curl script, but this one doesn't generate the headers or cookies that the request requires, so I'm not able to retrieve anything.
<?php
/**
* Send a POST requst using cURL
* @param string $url to request
* @param array $post values to send
* @param array $options for cURL
* @return string
*/
function curl_post($url, array $post = NULL, array $options = array())
{
$defaults = array(
CURLOPT_POST => 1,
CURLOPT_HEADER => 0,
CURLOPT_URL => $url,
CURLOPT_FRESH_CONNECT => 1,
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_FORBID_REUSE => 1,
CURLOPT_TIMEOUT => 4,
CURLOPT_POSTFIELDS => http_build_query($post)
);
$ch = curl_init();
curl_setopt_array($ch, ($options + $defaults));
if( ! $result = curl_exec($ch))
{
trigger_error(curl_error($ch));
}
curl_close($ch);
return $result;
}
?>
I'm not sure if that's the right approach.
Any tips/resources would be appreciated.
You can write this script in Selenium - it's just a browser driver, it will fill the form from the client side. If the page isn't very complicated, you can use library requests in Python and directly send POST data to the final page. Requests is a faster lib, and to write a script sending POST data you will need 5 mins of learning.