为什么会导致在创建goroutine时所用的时间不同?

Consider the following application, designed to measure goroutine creation latency. Assume that we are running with GOMAXPROCS=2.

package main

import "fmt"
import "time"

const numRuns = 10000

type  timeRecord struct{
    Ts time.Time
    Msg string
}

var timeStamps []timeRecord

func threadMain(done chan bool) {
    timeStamps = append(timeStamps, timeRecord{time.Now(), "Inside thread"})
    done <- true
}

func main() {
    timeStamps = make([]timeRecord, 0, numRuns*2)
    done := make(chan bool)
    dummy := 0
    for i := 0; i < numRuns; i++ {
        timeStamps = append(timeStamps, timeRecord{time.Now(), "Before creation"})
        go threadMain(done)
        <-done
    }

    // Regularize
    regularizedTime := make([]time.Duration, numRuns*2)
    for i := 0; i < len(timeStamps) ; i++ {
        regularizedTime[i] = timeStamps[i].Ts.Sub(timeStamps[0].Ts)
    }

    // Fake timetraced
    fmt.Printf("%6d ns (+%6d ns): %s
", 0, 0, timeStamps[0].Msg)
    for i := 1; i < len(timeStamps) ; i++ {
        fmt.Printf("%8d ns (+%6d ns): %s
", regularizedTime[i], (regularizedTime[i] - regularizedTime[i-1]).Nanoseconds(), timeStamps[i].Msg)
    }
}

On my server, this consistently outputs roughly a median 260 ns delta from Before creation to Inside thread. Now consider the following variation of the main method.

timeStamps = make([]timeRecord, 0, numRuns*2)
done := make(chan bool)
dummy := 0
for i := 0; i < numRuns; i++ {
    timeStamps = append(timeStamps, timeRecord{time.Now(), "Before creation"})
    go threadMain(done)
    for j := 0; j < 1000; j++ {
        dummy += j
    }
    <-done
}

Under this variation, the same time delta takes roughly 890 ns.

Obviously, the exact numbers are machine-specific, but the difference between the numbers is curious. Logically, if I am measuring between "Before creation" and "Inside thread", adding extra logic after the go statement seems like it should not increase that time, but it does.

Does anyone have a good idea for why the time increase is not occurring in the expected location?

Go scheduler is cooperative. It can switch the current goroutine to another one only at certain points of the program such as function calls, reading/writing to a channel, etc. I expect that the difference you observe is due to the fact that the goroutines are started later (after the added for loop).