I'm evaluating the XmlStreamer class to process potentially large XML files. It works great with tiny files but as soon as I feed it with a large XML file it eats all the system memory and hits whatever memory_limit
I set (even though I still haven't written code to use the data).
As far as I can tell, it loops through the top level nodes (excluding the root element) and makes a call to processNode()
when it finishes reading each one. The callback function receives a string with the complete XML string of the node and that seems to be the only way to access the data. (The provided example suggests parsing it with SimpleXML
.) This approach will clearly fail as soon as a top level node contains several MB worth of data, as it's my case.
So the class allows to override chunkCompleted()
«to improve performance». It seems to be exactly what I need—but I can't figure out how to use it for something useful.
The class calls chunkCompleted()
at regular intervals but I can't figure out how to access partially read data. The callback doesn't receive parameters, all class properties are private, there aren't methods I can call. The example given populates custom properties at processNode()
and reads them back from chunkCompleted()
but that's pretty pointless: you will not have data available until the complete top-level node has been processed and loaded into memory. All calls to chunkCompleted()
will do nothing except the last one, and in that case I could have done my stuff directly at processNode()
.
Furthermore, if I reduce the chunk size the class starts missing nodes, suggesting that it cannot process elements that don't fit in a single chunk.
Am I missing something obvious or the library is just not production ready?
class FirstExample extends XmlStreamer{
public function processNode($xmlString, $elementName, $nodeIndex){
echo __METHOD__ . PHP_EOL;
var_dump($xmlString, $elementName, $nodeIndex);
}
public function chunkCompleted(){
echo __METHOD__ . PHP_EOL;
}
}
$xml = new FirstExample('/path/to/my.xml', 80);
if( $xml->parse() ){
echo 'OK' . PHP_EOL;
}else{
echo 'Error' . PHP_EOL;
}