Real world problem: I'm generating a page dinamically. This page is an xml which is retrieved by the user (curl, file_get_contents or whatever can by made server side scripting). Once the user make the request, he start waiting and I start retrieving a large set of data from the db and building an xml with them (using the php dom objects). Once I've done I fire the "print $document->saveXML()". It takes about 8 minutes to create this 40 megabytes document. Then as it is ready I serve the page/document. Now I have a user who has a 60 seconds connection timeout: he said I need to send the first octet each 60 seconds. How can I achieve such a thing?
Since it's useless to post a 23987452 lines code cause nobody is gonna read them, I'll explain the script which serves this page as real-very-pseudo-pseudo-code:
1) I can't send real data since it is an xml document and it has to begin with "<?xml..."
to not mess up the parser.`
2) The user can't deal with firewall/serverconfig
3) I can't deal with "buy a more powerful server"
4) I tried using an ob_start() at the top of the script and then at the beginning of each loop a "header("Transfer-Encoding: chunked"); ob_flush(); "
but nothing: nothing comes before the 8 minutes.
Help me guys!!
I would
Generate a random value
Start the XML generating script as a background process (see e.g. here)
Make the generating script write the XML into a file with the random value as the name when the script is done
Frequently poll for the existence of that empty file, e.g. using Ajax requests every 10 seconds, until it's there. Then fetch the XML from the file.
You send padding and still have it be valid XML. Trivial examples include whitespace in a lot of places, or comments. Once you've sent the XML declaration, you could start a comment, and keep sending padding:
<?xml version="1.0">
<!-- this comment to prevent timeouts:
30
60
90
⋮
or whatever, the exact data doesn't matter of course.
That's the easy solution. The better solution is to make that generation run in the background, and e.g., use AJAX to poll the server every 10s to check if its done. Or to implement an alternate notification method (e.g., email a URL when the the document is ready).
If this isn't a browser accessing, you may want a trivially simple API: Have one request to start generating the document, and another to fetch it. The one to fetch it may return "not ready yet" as e.g., a HTTP status code 500, 503, or 504. Then the script requesting should retry later. (For example, with curl
, the --retry
option will do this).