如何在PHP中序列化和反序列化大文件? [关闭]

I have a HUGE complex data structure (TRIE) that I need to store for later use.

So, I'm using serialize/unserialize (please suggest any better method if any):

$fp = fopen("serialized_trie.txt","w+");
fwrite($fp,serialize($root));
fclose($fp);

$root = unserialize(file_get_contents("serialized_trie.txt"));

The trie itself is made from 1 million words. So its a big trie.

I need to somehow store this trie. Writing such a big trie to file takes huge amount of time. And file_get_contents in unserialize would cause entire file to be loaded in memory.

Do I need to use a binary file instead of txt file? How?

Also I've read about 3 techniques to store: serialize, json_encode, var_export

Do I need to use json_encode or var_export method in this case?

How do I QUICKLY store the trie and retrieve it?

You didn't specify what the actual file size is. With that said, the serialize function basically turns the variable into an intermediary text form that can be safely written to disk, but it's not at all optimized.

You could try compressing the file before it's written:

$fp = fopen("serialized_trie.gzd","w+");
//gzdeflate supports 0-9 levels of compression
//You might want to experiment
fwrite($fp, gzdeflate(serialize($root), 5));
fclose($fp);

To read in:

$root = unserialize(gzinflate(file_get_contents("serialized_trie.gzd")));

The extension is not important, as there is no standard for raw deflate files, but I'd suggest something other than .txt to indicate this is not an actual text file.

In regards to memory use, this is highly dependent on the size of your trie structure, which you have already indicated is large, but without any specifics.

As per my answer to your other question, this is going to be many times slower and than reading the variable from an in-memory cache.

Serialize is built to serialize one or more php variables and re-read those variables off disk. It is used for php session support.

json

json_encode is useful if you need to return data for use in a client that needs or supports javascript compatible variables.

var_export

var_export has some issues with complex data structures. With that said, it is possible to use var_export to write out the trie structure as a php script which could then be require_once(). This might be more performant than these other options.

$fp = fopen("trie.php","w+");
fwrite($fp, '<?php $root = ' . var_export($root) . '; ?>');
fclose($fp);

To read back in:

require_once('trie.php');

Obviously your script needs to place trie.php in a location under the webroot that is read/writeable, but that's a whole other discussion. Like any other include() you need the path to the script.