为什么myVar = strings.Fields(scanner.Text())比python中可比的操作花费更多的时间?

Consider the following code in golang

now := time.Now()
sec1 := now.Unix()

file, err := os.Open(file_name)
if err != nil {
    log.Fatal(err)
}
defer file.Close()

scanner := bufio.NewScanner(file)

var parsedLine []string

for scanner.Scan() {
    parsedLine = strings.Fields(scanner.Text())
}

fmt.Println(parsedLine)
now2 := time.Now()
sec2 := now2.Unix()
fmt.Println(sec2 - sec1) // takes 24 second for file1.txt

And consider this python program

start = time.time()

with open(file) as f:
    for line in f:
        parsedLine = line.split()

end = time.time() 
print end - start # takes 4.6450419426 second for file1.txt

I observe the golang program is 5 times slower than the python program on a mac book pro

Specifically this line

parsedLine = strings.Fields(scanner.Text())

is very slow.

If I change that line in golang to

if strings.Contains(scanner.Text(), "string_that_never_exist") {
     continue
}
// take less than 1 second

and python to

if "string_that_never_exist" in line:
    continue
# takes 2.86928987503 second

Golang version is now much faster than python one.

I am slightly perplexed on why strings.Fields(scanner.Text()) may be slower than line.split()

I feel I am missing something silly, can someone point me out why the golang version take longer than python

You're using Unicode in Go and bytestrings in Python. Your Python code has a much easier job, since it doesn't have to do Unicode decoding or handle all Unicode whitespace values.

Any benchmark should be a good scientific experiment. It must be reproducible.

First, define the readily available input:

The Complete Works of William Shakespeare by William Shakespeare:

http://www.gutenberg.org/files/100/100-0.txt

Next, fully define the executable programs:

linesplit.py:

import time; 
start = time.time()

# http://www.gutenberg.org/files/100/100-0.txt
file = "/home/peter/shakespeare.100-0.txt"
with open(file) as f:
    for line in f:
        parsedLine = line.split()

end = time.time() 
print (end - start)

linesplit.go:

package main

import (
    "bufio"
    "fmt"
    "log"
    "os"
    "strings"
    "time"
)

func main() {
    now := time.Now()
    sec1 := now.Unix()

    // http://www.gutenberg.org/files/100/100-0.txt
    file_name := "/home/peter/shakespeare.100-0.txt"
    file, err := os.Open(file_name)
    if err != nil {
        log.Fatal(err)
    }
    defer file.Close()

    scanner := bufio.NewScanner(file)

    var parsedLine []string

    for scanner.Scan() {
        parsedLine = strings.Fields(scanner.Text())
    }

    fmt.Println(parsedLine)
    now2 := time.Now()
    sec2 := now2.Unix()
    fmt.Println(sec2 - sec1) // takes 24 second for file1.txt
    fmt.Println(time.Since(now))
}

Then, provide the benchmark results:

$ python2 --version
Python 2.7.14
$ time python2 linesplit.py
.07024809169769
real    0m0.089s
user    0m0.089s
sys     0m0.000s

$ python3 --version
Python 3.6.3
$ time python3 linesplit.py
0.12172794342041016
real    0m0.159s
user    0m0.155s
sys     0m0.004s

$ go version
go version devel +39ad208c13 Tue Jun 12 19:10:34 2018 +0000 linux/amd64
$ go build linesplit.go && time ./linesplit
[]
1
91.833622ms
real    0m0.100s
user    0m0.094s
sys     0m0.004s

$ 

We have Python2 < Go < Python3 or 0.0724 < 0.0918 < 0.1217 or, as a ratio, 1.00 < 1.31 < 1.73. Python2 is ASCII. Go and Python3 are Unicode.