So I have 16 GB worth of XML files to process (about 700 files total), and I already have a functional PHP script to do that (With XMLReader) but it's taking forever. I was wondering if parsing in Python would be faster (Python being the only other language I'm proficient in, I'm sure something in C would be faster).
I think that both of them can rely over wrappers for fast C libraries (mostly libxml2) so there's shouldn't be too much difference in parsing per se.
You could try if there are differences caused by overhead, then it depends what are you gonna do over that XML. Parsing it for what?
I can't tell you for sure if Python will end up performing better than PHP (because I'm not terribly familiar with the performance characteristics of PHP). I can, however, give you a few suggestions.
Also, if you have some knowledge of C, in Python you can identify bottlenecks in the code and easily reimplement them in C (though I suspect you won't have a chance to do this).
There's actually three differing performance problems here:
Where you should look for performance improvements depends on which one of these is the biggest bottleneck.
My guess is that the last one is the biggest problem because writes is almost always the slowest: writes can't be cached, they requires writing to disk and if the data is sorted it can take a considerable time to find the right spot to write it.
You presume that the bottleneck is the first alternative, the XML parsing. If that is the case, changing language is not the first thing to do. Instead you should see if there's some sort of SAX parser for your language. SAX parsing is much faster and memory effective than DOM parsing.