I have a file containing plot data. Each line has 4 coordinates in total the data file can exceed 1 GB. Let's say, I would like to get the third column in a data file, which method should consider good practice and much faster?
Using execute:
exec("awk '{ print $3 }' data", $output);
Using PHP script:
$data = file("data");
$points = array();
foreach($data as $line)
$points[] = $line[2];
Moreover, since the server does not allow to read large file, I have to use fread to read the file in several parts. But fread is not smart enough and some work must be done to combine the last line in each part. Any suggestion or any better method to read a column on a file in php?
Here /file
is a 3.1 GB big file:
root# time awk '{ print $3 }' /file >/dev/null
real 1m42.430s
user 1m0.241s
sys 0m2.198s
okay. ±1.7 minutes for awk. Let's test PHP (without field splitting, just third char):
root# time php -r '$fp = fopen("/file", "r"); while (($buf = fgets($fp)) !== false) echo $buf[2]; fclose($fp);' >/dev/null
real 4m17.322s
user 3m16.571s
sys 0m31.625s
±4.3 minutes for PHP! I don't want to imagine how long it would take if I'd use @Jack's code...
PHP is far slower than awk
. On really big files, use awk (invoked by exec()). As you see here, PHP spends a lot of time in user space (three times more as awk).
fgets is your friend - http://ie.php.net/fgets. You can read the file line by line without having to load the whole file into memory.