CakePHP建议迭代一个巨大的表并生成一个站点地图?

I'm trying to create an XML sitemap using CakePHP, from a table which has more than 50,000 records at the moment, each record equivalent to a URI in the sitemap. Now the problem I'm facing is CakePHP is running me out of memory while generating it, for two reasons:

  1. A find('all') is building a huge associative array of the entire set of 50,000 URIs.
  2. Since I don't want to output HTML from the controller itself, I'm transferring the associative array containing URI, priority, change frequency etc., to the view with a $this->set() call -- which again is huge, containing 50,000 indices.

Is it possible at all, to do this while following MVC and CakePHP guidelines?

Are you sure you have to run out of memory on 50.000 records? Even if a row is 1K in size (pretty huge), you would have to deal with ~ 50 MB of data? My P1 had enough RAM to handle that. Set memory_limit in php.ini higher than the default. (Consider also tweaking max_execution_time.)

On the other hand, if you consider the data set as too huge and processing it as too resource intensive, you should not serve that page dynamically, it is the perfect DDoS bait. (At least I would cache it heavily.) You could schedule a cron job to re-generate the page every X hours by a server side script free from the MVC penalty of serving all data at once to the view, it could work on the rows sequentially.

Have you tried unBindModel (if you have relations)...

Whenever I have to do huge queries in cakephp I just use the "regular" mysql-functions like mysql_query, mysql_fetch_assoc etc. Much faster, and no lack of memory...

I had a similar problem this week, and stumbled across the Containable Behavior. This allows you to cut down any relationship related queries (if you have any).

The best solution would be to programmatically use LIMIT and OFFSET, and loop through the recordset small chunks at a time. This saves you from stuffing 50K records into memory at once.

find('all') is way too greedy, you'll have to be more specific if you don't want to run out of memory.

As stated above, use the Containable behavior. If you only need results from your table, (without associated tables) and for only a couple of fields, a more explicit query like this should be better :

$results = $this->YourModel->find('all', array(
    'contain' => false,
    'fields' => array('YourModel.name', 'YourModel.url')
);

You should also consider adding an html cache mechanism (cakePHP has a builtin or using the one suggested by Matt Curry).

Of course it will be a cached version, and won't be perfectly up to date to your list. If you want more control you can always save the result in cake cache (using Cache::write), using the afterSave/afterDelete callbacks of your model to update the cached value and recreate the cached xml file from here.

I know this question is old, but for really huge queries there is still no good solution i think.

To iterate through a huge resultset you can use DboSource methods.

First get the DBO

$dbo = $this->Model->getDataSource();

Build the query

$sql = $dbo->buildStatement($options);

Then execute the statement and iterate through the results

if ($dbo->execute($sql))
{
    while ($dbo->hasResult() && $row = $dbo->fetchResult()) {
        // $row is an array with same structure like find('first')
    }
}

Use https://github.com/jamiemill/cakephp_find_batch or implement this logic by yourself.