这样在Go中会更有效吗?

I wrote a piece of code to illustrate the standard command grep in Go, but the speed is far behind it, could someone give me any advances? here is the code:

package main

import (
    "bufio"
    "fmt"
    "log"
    "os"
    "strings"
    "sync"
)

func parse_args() (file, pat string) {
    if len(os.Args) < 3 {
        log.Fatal("usage: gorep2 <file_name> <pattern>")
    }

    file = os.Args[1]
    pat = os.Args[2]
    return
}

func readFile(file string, to chan<- string) {
    f, err := os.Open(file)
    if err != nil {
        log.Fatal(err)
    }
    defer f.Close()

    freader := bufio.NewReader(f)
    for {
        line, er := freader.ReadBytes('
')
        if er == nil {
            to <- string(line)
        } else {
            break
        }

    }
    close(to)
}

func grepLine(pat string, from <-chan string, result chan<- bool) {
    var wg sync.WaitGroup

    for line := range from {
        wg.Add(1)

        go func(l string) {
            defer wg.Done()
            if strings.Contains(l, pat) {
                result <- true
            }
        }(string(line))
    }

    wg.Wait()
    close(result)
}

func main() {
    file, pat := parse_args()
    text_chan := make(chan string, 10)
    result_chan := make(chan bool, 10)

    go readFile(file, text_chan)
    go grepLine(pat, text_chan, result_chan)

    var total uint = 0
    for r := range result_chan {
        if r == true {
            total += 1
        }
    }

    fmt.Printf("Total %d
", total)
}

The time in Go:

>>> time gogrep /var/log/task.log DEBUG 

Total 21089

real    0m0.156s
user    0m0.156s
sys 0m0.015s

The time in grep:

>>> time grep DEBUG /var/log/task.log | wc -l

21089

real    0m0.069s
user    0m0.046s
sys 0m0.064s

For an easily reproducible benchmark, I counted the number of occurences of the text "and" in Shakespeare.

gogrep:

$ go build gogrep.go && time ./gogrep /home/peter/shakespeare.txt and 
Total 21851
real    0m0.613s
user    0m0.651s
sys 0m0.068s

grep:

$ time grep and /home/peter/shakespeare.txt | wc -l
21851
real    0m0.108s
user    0m0.107s
sys 0m0.014s

petergrep:

$ go build petergrep.go && time ./petergrep /home/peter/shakespeare.txt and 
Total 21851
real    0m0.098s
user    0m0.092s
sys 0m0.008s

petergrep is written in Go. It's fast.

package main

import (
    "bufio"
    "bytes"
    "fmt"
    "log"
    "os"
)

func parse_args() (file, pat string) {
    if len(os.Args) < 3 {
        log.Fatal("usage: petergrep <file_name> <pattern>")
    }
    file = os.Args[1]
    pat = os.Args[2]
    return
}

func grepFile(file string, pat []byte) int64 {
    patCount := int64(0)
    f, err := os.Open(file)
    if err != nil {
        log.Fatal(err)
    }
    defer f.Close()
    scanner := bufio.NewScanner(f)
    for scanner.Scan() {
        if bytes.Contains(scanner.Bytes(), pat) {
            patCount++
        }
    }
    if err := scanner.Err(); err != nil {
        fmt.Fprintln(os.Stderr, err)
    }
    return patCount
}

func main() {
    file, pat := parse_args()
    total := grepFile(file, []byte(pat))
    fmt.Printf("Total %d
", total)
}

Data: Shakespeare: pg100.txt

Go regular expressions are fully utf-8 and I think that has some overhead. They also have a different theoretical basis meaning they will always run in a time proportional to the length of the input. It is noticeable that Go regexps just aren't as fast as the pcre regexp in use by other languages. If you look at the benchmarks game shootouts for the regexp test you'll see what I mean.

You can always use the pcre library directly if you want a bit more speed though.

A datapoint on the relevance of UTF-8 in regexp parsing: I've a long-used custom perl5 script for source grepping. I recently modified it to support UTF-8 so it could match fancy golang symbol names. It ran a FULL ORDER OF MAGNITUDE slower in repeated tests. So while golang regexp's do pay a price for the predictability of it's runtime, we also have to factor UTF-8 handling into the equation.