在php中的页面请求之间保留mongodb游标

I have a very large dataset that i am exporting using a batch process to keep the page from timing out. The whole process can take over an hour, and i'm using drupal batch which basically reloads the page with a status on how far the process has completed. Each page request essentially runs the query again which includes a sort which takes a while. Then it exports the data to a temp file. The next page load runs the full mongo query, sorts, skips the entries already exported, and exports more to the temp file. The problem is that each page load makes mongo rerun the entire query and sort. I'd like to be able to have the next batch page just pick up the same cursor where it left off and continue to pull the next set of results.

The MongoDB Manual entry for cursor.skip() gives some advice:

Consider using range-based pagination for these kinds of tasks. That is, query for a range of objects, using logic within the application to determine the pagination rather than the database itself. This approach features better index utilization, if you do not need to easily jump to a specific page.

E.g If your nightly batch process runs over the data accumulated in the last 24hrs, perhaps you can run date-range based queries (maybe one per hour of the day) and process your data that way. I'm assuming that your data contains some sort of usable time stamp per document, but you get the idea.

Although cursors live on the server and only timeout after roughly 10minutes of no-activity, the PHP driver does not support persisting cursors between requests.

At the end of each request the driver will kill all cursors created during that request that have not been exhausted. This also happens when all references to the MongoCursor object are removed (eg $cursor = null).

This is done as its unfortunately fairly common for applications not to iterate over the entire cursor, and we don't want to leave unused cursors around on the server as it could cause performance implications.

For your specific case, the best way to work around this problem is to improve your indexes so loading the cursor is faster. You may also want to only select some subset of the data so you have a fixed point you can request data between.

Say, for reports, your first request may ask for all data from 1am to 2am. Then your next request asks for all data from 2am to 3am and so on and on, like Saftschleck explains.

You may also want to look into the aggregation framework, which is designed to do "online reporting": http://docs.mongodb.org/manual/aggregation/