如何在PHP扩展中捕获上传的文件数据

I'm writing a PHP extension now in c/c++. User uploads a file (could be POST or PUT method, but I can limit it to POST only). I need to capture the file data while being uploaded, without writing it to disk on the server. I need to process the data and (maybe, depending on a situation) send it somewhere else or save it to disk. Of course I know, that I can process the file after it has been uploaded (saved on disk on the server), but I would like to avoid it. I also need to do something opposite: I need to generate a file "on the fly" and send it to the user. All metadata of the generated file is known beforehand (e.g. size, name).

I've been searching around for some time now and I could not find anything even close to the solution. Is there any example(s) or existing PHP extension that do(es) something like this (at least something simmilar) ?

I can't comment on hooking into the upload process, but for the download part you need:

  1. a php script handling the download request and sending http headers;
    care must be taken concerning the filename as per RFC 2183 actually only us-ascii is allowed.
  2. a function/method in your php extension that streams the data to the browser

php script

here is a complete php script that additionally checks whether only a range of the wanted file is requested:

<?php

// sanity checks ...



// script must not timeout
set_time_limit(0);
// user abortion is checked in extension while streaming the data
ignore_user_abort(true);


$filename = $_GET['filename'];
// TODO determine filesize
$filesize = 0;
$offset = 0;
$range_len = -1;
$have_valid_range = false;

if (isset($_SERVER['HTTP_RANGE']))
{
    // split 'bytes=n-m'
    list($range_type, $range) = explode('=', $_SERVER['HTTP_RANGE']);
    // split 'n-m' or 'n-'
    $range = explode('-', $range);
    // range type can only be 'bytes', check it anyway
    $have_valid_range = ($range_type == 'bytes') && is_array($range);
    if (!$have_valid_range)
    {
        header('HTTP/1.1 416 Requested Range Not Satisfiable', true, 416);
        exit;
    }

    if ($range[0] > $filesize)
    {
        $range[0] = $filesize;
    }
    if ((!$range[1]             )   || 
        ($range[1] > $filesize  )   )
    {
        $range[1] = $filesize;
    }
    $offset = $range[0];
    $range_len = $range[1]-$range[0]+1;
}

$attachment_filename = 'xyz';


// send metadata
header('Accept-Ranges: bytes');
if ($have_valid_range)
{
    header('HTTP/1.1 206 Partial Content', true, 206);
    header('Content-Length: ' . $range_len);
    header('Content-Range: bytes ' . $range[0] . '-' . $range[1] . ($filesize > 0 ? ('/' . $filesize) : ''));
}
else if ($filesize > 0)
{
    header('Content-Length: ' . $filesize);
}

// a note about the suggested filename for saving the attachment:
// It's not as easy as one might think!
// We deal (in our php scripts) with utf-8 and the filename is either the export profile's name or a term 
// entered by the user in the download form. Now the big problem is:
// According to the rfc for the Content-Disposition header only us-ascii characters are allowed! 
// (see http://greenbytes.de/tech/webdav/rfc2183.html, section "the filename parameter")
// However, all major browsers accept the filename to be encoded in iso-8859-1 (at least).
// There are other forms like: filename*="utf-8''<urlencoded filename>" but not 
// all browsers support this (most notably IE, only firefox and opera at the moment);
// (see http://greenbytes.de/tech/tc2231/ for testcases)
// 
// Additionally, IE doesn't like so much the '.' and ';' because it treats them as the beginning of the file extension,  
// and then thinks that it deals with a .*&%$§ file instead of a .zip file.
// The double quote '"' is already used as a delimiter for the filename parameter and it's unclear to me 
// how browsers would handle it.
// 
// Hence the procedure to produce a safe suggested filename as the least common denominator is as follows:
// Replace characters to be known as problematic with an underscore and encode the filename in iso-8859-1;
// Note that '?' (they can also result from utf8_decode()), '*', '<', '>', '|', ';', ':', '.', '\' are replaced by 
// firefox and IE with '_' anyway, additionally '#' by IE - meaning that they offer a filename with the mentioned 
// characters replaced by the underscore, i.e.: abc äöü +~*?ß=}'!§$%&/()´`<>|,-_:__@?\_{[]#.zip  -->  abc äöü +~__ß=}'!§$%&_()´`___,-____@___{[]#.zip 
$safe_attachment_fname = utf8_decode(str_replace(array('.', ';', '"'), '_', $attachment_filename)) . '.zip';
$filename_param = 'filename="' . $safe_attachment_fname . '"';

header('Content-Transfer-Encoding: binary');
header('Content-Type: application/zip');
header('Content-Disposition: attachment; ' . $filename_param);
// file can be cached forever by clients and proxies
header('Cache-Control: public');


// disable output buffering, stream directly to the browser;
// in fact, this is a must, otherwise php might crash
while (ob_get_level())
    ob_end_flush();


// stream data
ext_downstreamdata($filename, $offset, $range_len);

?>

streaming from C/C++

now, for the c++ part, the function ext_downstreamdata() mentioned in the php-script above is entirely implementation-specific, but the data streaming itself can be generalized.

E.g. I had the task to stream file data in a multi-tier application directly from the appserver to the browser.

Here is a function that acts as a callback to a streaming function from within your C++ code, receiving a pointer to the data and its length (returning a windows error code):

unsigned long stream2browser(const void* pdata, size_t nLen)
{
    if (nLen)
    {
        // fetch zend's tls stuff
        TSRMLS_FETCH();

        // send data via the zend engine to the browser;
        // this method uses whatever output buffer mechanism (compression, ...) is in use;
        // It's a really good idea to turn off all output buffer levels in the php script because of 
        // strange crashes somewhere within the zend engine (or in one of the extensions?)
        // I did some debugging and the browser really crashes and it's not our fault, turning off the output 
        // buffering solves all problems; you turn it off like this in the script:
        //  <code>
        //  while (ob_get_level())
        //      ob_end_flush();
        //  </code>
        // I'm thinking to use an unbuffered output function (e.g. php_ub_body_write) but don't know for sure how to use it, so 
        // still stay away from it and rely on the script having disabled all output buffers

        // note: php_write returns int but this value is the bytes sent to the browser (which is nLen)
        size_t nSent = php_write((void*) pdata, uint(nLen) TSRMLS_CC);
        if (nSent < nLen)
        {
            if (PG(connection_status) & PHP_CONNECTION_ABORTED)
                return ERROR_CANCELLED;
            else
                return ERROR_NOT_CAPABLE;
        }
    }

    return ERROR_SUCCESS;
}