自动批量文件下载的最佳方法

I am attempting to create a cron job that downloads image files that are stored in a queue in our database.

All of the functions that we are using work properly when run on our web server, however when I run the cron job using the following command: php index.php cron image_download I receive a Segmentation Fault error.

Debugging the cron job shows that this error occurs when the data is passed to the get_url_content function, which is called here:

foreach($urls as $url){

    $content = $this->get_url_content($url); 
}

And the function is here:

function get_url_content($url){
    $agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_VERBOSE, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
    curl_setopt($ch, CURLOPT_USERAGENT, $agent);
    curl_setopt($ch, CURLOPT_URL,$url);
    return curl_exec($ch);
}

Is there a better way to download these files? Is it likely that a different method would not cause the same segmentation fault error? Thank you!

UPDATE: It appears that various methods I am trying are continually causing issues. I am seeing either "Segmentation Fault" or "Killed" errors returned from the cron job. Someone recommended that I look into using Iron.io for this so I am going to check that out. If anyone has other recommendations for how to manage this best I would appreciate additional information, thanks.

You can try this approach, but before that, are you giving it the full URL?

function get_content($url){
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_VERBOSE, 1);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_AUTOREFERER, false);
    curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    $result = curl_exec($ch);
    curl_close($ch);
    return($result);
}

function save_content($text,$new_filename){
    $fp = fopen($new_filename, 'w');
    fwrite($fp, $text);
    fclose($fp);
}

// replace this with your array of urls from the database (make sure it is an array)
$urls = ['http://domain.com/path/to/file.zip', 'http://another.com/path/to/image.img'];

foreach($urls as $url){
    $new_filename = basename($url);
    $temp = get_content($url);
    save_content($temp,$new_filename);
}

This would get the file contents via its complete url and save it to disk, thus the being downloaded.

If you are not limited to curl, you may try something like:

$urls = ['http://domain.com/path/to/file.zip', 'http://another.com/path/to/image.img'];

foreach($urls as $url){
    $new_filename = basename($url);
    // or fopen can be file_get_contents: file_get_contents($url)
    file_put_contents($new_filename, fopen($url, 'r'));
}

or even

foreach($urls as $url){
    $new_filename = basename($url);
    shell_exec("wget $url -O $new_filename");
}

Use curl option CURLOPT_FILE to download the file directly into the file from curl. For this case I have commented out two other options from your existing code. Here is your modified function:

function get_url_content($url, $file){
    $fp = fopen ($file, 'w+');                   // open file handle

    $agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_VERBOSE, true);
    // curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_FILE, $fp);          // output to file
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);  // handle redirect
    // curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
    curl_setopt($ch, CURLOPT_USERAGENT, $agent);
    curl_setopt($ch, CURLOPT_URL,$url);
    curl_exec($ch);
    fclose($fp);                                  // closing file handle
}

Notice, I have added second parameter(file name as $file) with the function. So just pass your url and the file path(of course absolute path) to it.

If you are fluent at Shell scripting, you can use the curl's commandline option to download the files as well. For example this command will download the image into a designated file.

curl -s -L "http://img_url/" -o /var/path/image.jpeg

I spent a long time faffing around with alternative ways to do this sort of multifile download before working out that a straightforward solution is to use zipArchive. This allows you to create and open a zip file, add files to it, and close it. You can then create a weblink to the archive.