I am trying to write a .tgz file containing tens if not hundreds of thousands of file entries, the contents of each of which have come from a string in a database. Each file entry is about 2-5k of data.
I want to avoid doing this without having to write the files out first. Currently I have PHP creating a traditional directory structure, writing files, then making a tgz from that at the very end using a shellexec.
The disk we are using is slow and so writing tens of thousands of files is taking ages. Even running a prototype on another machine with a fast disk using a tmpfs ramdisk and lots of CPU, I get a rate of about 100-200 file entries per second, which feels slow - half an hour for 150,000 files in a directory structure. Once that has been written, the actual conversion from the native OS directory structure to tgz is not problematic.
I was hoping to use PharData to do the writing. However, PharData::addFromString seems to do a file write as soon as the file is added, rather than an an Open->Add->Writeout pattern.
Can anyone suggest any strategies here?
The eventual tgz file is then to be made available for download, and will not be refreshed often. But because there are a series of these files to be created, having to wait 30-60+ minutes just to package it becomes quite a blocker.
You can use the php gzopen/gzwrite/gzclose functions directly, and format your own tar headers, followed by the entry data. There is an example on the php gzwrite documentation page.