我怎样才能迅速总结文件中的所有数字?

I have a file which contains several thousand numbers, each on it's own line:

34
42
11
6
2
99
...

I'm looking to write a script which will print the sum of all numbers in the file. I've got a solution, but it's not very efficient. (It takes several minutes to run.) I'm looking for a more efficient solution. Any suggestions?

转载于:https://stackoverflow.com/questions/2702564/how-can-i-quickly-sum-all-numbers-in-a-file

For a Perl one-liner, it's basically the same thing as the awk solution in Ayman Hourieh's answer:

 % perl -nle '$sum += $_ } END { print $sum'

If you're curious what Perl one-liners do, you can deparse them:

 %  perl -MO=Deparse -nle '$sum += $_ } END { print $sum'

The result is a more verbose version of the program, in a form that no one would ever write on their own:

BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
    chomp $_;
    $sum += $_;
}
sub END {
    print $sum;
}
-e syntax OK

Just for giggles, I tried this with a file containing 1,000,000 numbers (in the range 0 - 9,999). On my Mac Pro, it returns virtually instantaneously. That's too bad, because I was hoping using mmap would be really fast, but it's just the same time:

use 5.010;
use File::Map qw(map_file);

map_file my $map, $ARGV[0];

$sum += $1 while $map =~ m/(\d+)/g;

say $sum;

I don't know if you can get a lot better than this, considering you need to read through the whole file.

$sum = 0;
while(<>){
   $sum += $_;
}
print $sum;

You can use awk:

awk '{ sum += $1 } END { print sum }' file

Just to be ridiculous:

cat f | tr "\n" "+" | perl -pne chop | R --vanilla --slave

This is straight Bash:

sum=0
while read -r line
do
    (( sum += line ))
done < file
echo $sum

I have not tested this but it should work:

cat f | tr "\n" "+" | sed 's/+$/\n/' | bc

You might have to add "\n" to the string before bc (like via echo) if bc doesn't treat EOF and EOL...

sed ':a;N;s/\n/+/;ta' file|bc

Here's another one-liner

( echo 0 ; sed 's/$/ +/' foo ; echo p ) | dc

This assumes the numbers are integers. If you need decimals, try

( echo 0 2k ; sed 's/$/ +/' foo ; echo p ) | dc

Adjust 2 to the number of decimals needed.

Another for fun

sum=0;for i in $(cat file);do sum=$((sum+$i));done;echo $sum

or another bash only

s=0;while read l; do s=$((s+$l));done<file;echo $s

But awk solution is probably best as it's most compact.

Here's another:

open(FIL, "a.txt");

my $sum = 0;
foreach( <FIL> ) {chomp; $sum += $_;}

close(FIL);

print "Sum = $sum\n";
cat nums | perl -ne '$sum += $_ } { print $sum'

(same as brian d foy's answer, without 'END')

Just for fun, lets do it with PDL, Perl's array math engine!

perl -MPDL -E 'say rcols(shift)->sum' datafile

rcols reads columns into a matrix (1D in this case) and sum (surprise) sums all the element of the matrix.

This works:

{ tr '\n' +; echo 0; } < file.txt | bc

Here is a solution using python with a generator expression. Tested with a million numbers on my old cruddy laptop.

time python -c "import sys; print sum((float(l) for l in sys.stdin))" < file

real    0m0.619s
user    0m0.512s
sys     0m0.028s

C always wins for speed:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
    ssize_t read;
    char *line = NULL;
    size_t len = 0;
    double sum = 0.0;

    while (read = getline(&line, &len, stdin) != -1) {
        sum += atof(line);
    }

    printf("%f", sum);
    return 0;
}

Timing for 1M numbers (same machine/input as my python answer):

$ gcc sum.c -o sum && time ./sum < numbers 
5003371677.000000
real    0m0.188s
user    0m0.180s
sys     0m0.000s

Just for fun, let's benchmark it:

$ for ((i=0; i<1000000; i++)) ; do echo $RANDOM; done > random_numbers

$ time perl -nle '$sum += $_ } END { print $sum' random_numbers
16379866392

real    0m0.226s
user    0m0.219s
sys     0m0.002s

$ time awk '{ sum += $1 } END { print sum }' random_numbers
16379866392

real    0m0.311s
user    0m0.304s
sys     0m0.005s

$ time { { tr "\n" + < random_numbers ; echo 0; } | bc; }
16379866392

real    0m0.445s
user    0m0.438s
sys     0m0.024s

$ time { s=0;while read l; do s=$((s+$l));done<random_numbers;echo $s; }
16379866392

real    0m9.309s
user    0m8.404s
sys     0m0.887s

$ time { s=0;while read l; do ((s+=l));done<random_numbers;echo $s; }
16379866392

real    0m7.191s
user    0m6.402s
sys     0m0.776s

$ time { sed ':a;N;s/\n/+/;ta' random_numbers|bc; }
^C

real    4m53.413s
user    4m52.584s
sys 0m0.052s

I aborted the sed run after 5 minutes

None of the solution thus far use paste. Here's one:

paste -sd+ filename | bc

As an example, calculate Σn where 1<=n<=100000:

$ seq 100000 | paste -sd+ | bc -l
5000050000

(For the curious, seq n would print a sequence of numbers from 1 to n given a positive number n.)

$ perl -MList::Util=sum -le 'print sum <>' nums.txt

With Ruby:

ruby -e "File.read('file.txt').split.inject(0){|mem, obj| mem += obj.to_f}"

You can do it with Alacon - command-line utility for Alasql database.

It works with Node.js, so you need to install Node.js and then Alasql package:

To calculate sum from TXT file you can use the following command:

> node alacon "SELECT VALUE SUM([0]) FROM TXT('mydata.txt')"

I prefer to use R for this:

$ R -e 'sum(scan("filename"))'

More succinct:

# Ruby
ruby -e 'puts open("random_numbers").map(&:to_i).reduce(:+)'

# Python
python -c 'print(sum(int(l) for l in open("random_numbers")))'

Another option is to use jq:

$ seq 10|jq -s add
55

-s (--slurp) reads the input lines into an array.

I prefer to use GNU datamash for such tasks because it's more succinct and legible than perl or awk. For example

datamash sum 1 < myfile

where 1 denotes the first column of data.

Perl 6

say sum lines
~$ perl6 -e '.say for 0..1000000' > test.in

~$ perl6 -e 'say sum lines' < test.in
500000500000

It is not easier to replace all new lines by +, add a 0 and send it to the Ruby interpreter?

(sed -e "s/$/+/" file; echo 0)|irb

If you do not have irb, you can send it to bc, but you have to remove all newlines except the last one (of echo). It is better to use tr for this, unless you have a PhD in sed .

(sed -e "s/$/+/" file|tr -d "\n"; echo 0)|bc