将大型可自定义项目列表传输到客户端

I’m creating an API, that retrieves items from a third party component and returns these in a specified XML/CSV/TEXT structure, that can be customized by the admin via a template.

The problem: One API-request may easily include millions of items. So it’s memory-wise not possible to create the whole list server side and send it to the client.

Instead the items should be created on-the-fly and the results should be sent to the client immediately, without storing them in PHP’s memory.

How it this possible?

Example template:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>

<items>
    {items}
        <item no="{number}">{item}</item>
    {/items}
</items>

Current code example without streaming. Not actually working, but you should get die idea:

echo preg_replace_callback('@{items}(.*){/items}@si', function (array $matches)
{
    return createItems($matches[1]);
}, $template);


function createItems($itemTemplate)
{
    $items = '';
    while (itemsExist()) {
        $items .= getItem($itemTemplate);
    }
}

I guess, I should stop buffering each item in a var and instead echo them directly? But how do I keep the XML’s/CSV’s/JSON’s structure intact or whatever else is in the template around the list?

If you're reaching the point where the result sets that you're generating on the server are too large to fit into memory, you should consider how the clients of your API are going to process such a large result set too.

There are two patterns that I've seen to solve this kind of problem:

1. Pagination

Use pagination within your API to return pages of results, just as you would on a webpage. Usually, this involves supplying a URL to the "next page" of results in the resultset in your API response. Then, the client can simply iterate each of the API responses until there is no "next page" URL present in the response, indicating that the end of the resultset has been reached.

Your API response would look something like this:

{ 
   items: [ { }, { } ... ],
   next_page: "http://my.domain.com/results?page=2"
}

2. Asynchronous resultset generation

With this approach, your clients would POST to your API and be immediately given a token.

The API would perform the generation of the entire response in the background - usually using a message queue system such as RabbitMQ or SQS - saving the result to a file on the web server. Note that this takes place outside of an HTTP request, so the client is not blocking the webserver for the duration of the process.

The client polls the API regularly, passing the token that it received from the API previously. Eventually, the API will respond with some data to indicate that the resultset has been generated and is ready to be downloaded. The API could then either include the contents of the resultset in it's response, or provide a URL that the client could download the resultset from.

There is a third alternate, but I wouldn't recommend it unless you plan on building client libraries for your API consumers. You could make use of PHP's stream_* functions to create a stream that your API will operate over. This will allow you to push data onto the stream, and your clients to read data from the stream, without consuming high amounts of memory. There is a lot of additional work involved with this, however, especially if you need an entire XML/JSON document to be parsed by the client.

I would recommend pagination. Its easy to reason about, not difficult to implement on the API end, reusable and removes memory consumption issues on both the client and server side.