I have the following Go code:
package main
import ("fmt"
"os"
"bufio")
func main() {
reader := bufio.NewReader(os.Stdin)
scanner := bufio.NewScanner(reader)
for scanner.Scan() {
fmt.Println(scanner.Text())
}
}
and the following Python code:
import sys
for ln in sys.stdin:
print ln,
Both simply read lines from standard input and print to standard output. The Python version only uses 1/4 of the time the Go version needs (tested on a 16 million line text file and output to /dev/null). Why is that?
UPDATE: Following JimB and siritinga's advice, I changed Go's output to a buffered version. Now the Go version is much faster, but still about 75% slower than the Python version.
package main
import ("os"
"bufio")
func main() {
reader := bufio.NewReader(os.Stdin)
scanner := bufio.NewScanner(reader)
writer := bufio.NewWriter(os.Stdout)
for scanner.Scan() {
writer.WriteString(scanner.Text()+"
")
}
}
As JimB said, stop using strings. Python 2.x strings are just raw bytes. Go strings are UTF-8. That requires encoding, checking for errors and so on. On the other hand, you also get more features out of strings. Also, building strings requires extra memory allocation.
If you change to unicode strings (upgrade to 3.x or unicode string implementation for 2.x) with your Python implementation the performance will tank. If you change to similar encoding with Go version, you will get much better performance:
package main
import ("os"
"bufio")
func main() {
reader := bufio.NewReader(os.Stdin)
scanner := bufio.NewScanner(reader)
writer := bufio.NewWriter(os.Stdout)
newline := []byte("
")
for scanner.Scan() {
writer.Write(scanner.Bytes())
writer.Write(newline)
}
}
On my system, using a word list with 65 million lines, Python:
real 0m12.724s
user 0m12.581s
sys 0m0.145s
And the Go version:
real 0m4.408s
user 0m4.276s
sys 0m0.135s
It should also be noted that as far as performance comparisons go this is not a good case. It does not represent what a real application would do, what would be to handle the data somehow.