如何对修改输入的函数进行基准测试?

When I'm benchmarking a function that modifies its input, I have to copy the test data for each loop of the benchmark, and pause the timer while I'm doing so. This can mean that if I run go test -bench MyTest -benchtime 1s the test can take 2 full minutes rather than 1 second.

Am I doing something wrong or will I just have to live with this?


More context:

I'm writing a program for reading syslog logs. Part of my logging paradigm is that the first line of a logged message contains readable text, and following lines contain "extra information", like a stack trace. My log reader therefore (among other things) splits the message on the first line break, which is escaped to #012 by rsyslog.

Here is the code for that:

// Splits the main line from extra information
func splitMessageExtra(line *string) string {
    var prev rune

    for i, char := range *line {
        if prev == 0 && char == '#' {
            prev = char
            continue
        }

        if prev == '#' && char == '0' {
            prev = char
            continue
        }

        if prev == '0' && char == '1' {
            prev = char
            continue
        }

        if prev == '1' && char == '2' {
            extra := (*line)[i+1:]
            *line = (*line)[0 : i-3]

            return extra
        }

        prev = 0
    }

    return ""
}

It originally used strings.Split and returned new strings, but cpu profiling showed that it was way too slow.

Here is the benchmark function:

var testMessage = `Feb 10 15:16:20 foo_stats[-] (warning): [foo_stats.postfix, line 166, thread "processor_mta03"]: Skipped line because there is no context:#012Feb 10 15:16:20 mta03 postfix/qmgr[7419]: ABCDEF123: from=<>, size=24431, nrcpt=1 (queue active)`

func BenchmarkSplitMessageExtra(b *testing.B) {
    for i := 0; i < b.N; i++ {
        b.StopTimer()
        msg := string([]byte(testMessage))
        b.StartTimer()

        splitMessageExtra(&msg)
    }
}

Here's a run without pausing the timer:

$ go test -bench SplitMessageExtra -benchtime 1s
BenchmarkSplitMessageExtra-8     3000000           434 ns/op
PASS
ok      github.com/Hubro/logreader  1.730s

And here's a run with the exact benchmark function above:

$ go test -bench SplitMessageExtra -benchtime 1s
BenchmarkSplitMessageExtra-8     5000000           385 ns/op
PASS
ok      github.com/Hubro/logreader  100.563s

Notice it takes AGES to run.

Your code and benchmark do seem slow. Here's a faster version.

package main

import (
    "strings"
    "testing"
)

// Splits the main line from extra information
func splitMessageExtra(line *string) string {
    const newline = "#012"
    i := strings.Index(*line, newline)
    if i < 0 {
        return ""
    }
    extra := (*line)[i+len(newline):]
    *line = (*line)[0:i]
    return extra
}

var testMessage = `Feb 10 15:16:20 foo_stats[-] (warning): [foo_stats.postfix, line 166, thread "processor_mta03"]: Skipped line because there is no context:#012Feb 10 15:16:20 mta03 postfix/qmgr[7419]: ABCDEF123: from=<>, size=24431, nrcpt=1 (queue active)`

func BenchmarkSplitMessageExtra(b *testing.B) {
    for i := 0; i < b.N; i++ {
        msg := testMessage
        splitMessageExtra(&msg)
    }
}

Output:

$ go test -bench=.
goos: linux
goarch: amd64
pkg: extra
BenchmarkSplitMessageExtra-4    50000000            32.2 ns/op
PASS
ok      extra   1.647s

For comparison, here are the results from your code and benchmark. Your code and benchmark are slower than mine: 968 ns/op and 50.184s versus 32.2 ns/op and 1.647s respectively.

package main

import (
    "testing"
)

// Splits the main line from extra information
func splitMessageExtra(line *string) string {
    var prev rune
    for i, char := range *line {
        if prev == 0 && char == '#' {
            prev = char
            continue
        }
        if prev == '#' && char == '0' {
            prev = char
            continue
        }
        if prev == '0' && char == '1' {
            prev = char
            continue
        }
        if prev == '1' && char == '2' {
            extra := (*line)[i+1:]
            *line = (*line)[0 : i-3]

            return extra
        }
        prev = 0
    }
    return ""
}

var testMessage = `Feb 10 15:16:20 foo_stats[-] (warning): [foo_stats.postfix, line 166, thread "processor_mta03"]: Skipped line because there is no context:#012Feb 10 15:16:20 mta03 postfix/qmgr[7419]: ABCDEF123: from=<>, size=24431, nrcpt=1 (queue active)`

func BenchmarkSplitMessageExtra(b *testing.B) {
    for i := 0; i < b.N; i++ {
        b.StopTimer()
        msg := string([]byte(testMessage))
        b.StartTimer()
        splitMessageExtra(&msg)
    }
}

Output:

$ go test -bench=.
goos: linux
goarch: amd64
pkg: extra
BenchmarkSplitMessageExtra-4     2000000           968 ns/op    
PASS
ok      extra   50.184s

Some of your code is unnecessary; it uses CPU time and triggers allocations. For example, converting to utf-8 bytes to runes,for i, char := range *line {}, and converting string to []byte to string, string([]byte(testMessage)) . Some algorithms could be improved. For example, searching for a newline.

There is nothing wrong. StopTimer and StartTimer should be much more expensive than splitMessageExtra. Both of them call runtime.ReadMemStats. See here.