Consider the following application, designed to measure goroutine
creation latency. Assume that we are running with GOMAXPROCS=2
.
package main
import "fmt"
import "time"
const numRuns = 10000
type timeRecord struct{
Ts time.Time
Msg string
}
var timeStamps []timeRecord
func threadMain(done chan bool) {
timeStamps = append(timeStamps, timeRecord{time.Now(), "Inside thread"})
done <- true
}
func main() {
timeStamps = make([]timeRecord, 0, numRuns*2)
done := make(chan bool)
dummy := 0
for i := 0; i < numRuns; i++ {
timeStamps = append(timeStamps, timeRecord{time.Now(), "Before creation"})
go threadMain(done)
<-done
}
// Regularize
regularizedTime := make([]time.Duration, numRuns*2)
for i := 0; i < len(timeStamps) ; i++ {
regularizedTime[i] = timeStamps[i].Ts.Sub(timeStamps[0].Ts)
}
// Fake timetraced
fmt.Printf("%6d ns (+%6d ns): %s
", 0, 0, timeStamps[0].Msg)
for i := 1; i < len(timeStamps) ; i++ {
fmt.Printf("%8d ns (+%6d ns): %s
", regularizedTime[i], (regularizedTime[i] - regularizedTime[i-1]).Nanoseconds(), timeStamps[i].Msg)
}
}
On my server, this consistently outputs roughly a median 260 ns delta from Before creation
to Inside thread
. Now consider the following variation of the main method.
timeStamps = make([]timeRecord, 0, numRuns*2)
done := make(chan bool)
dummy := 0
for i := 0; i < numRuns; i++ {
timeStamps = append(timeStamps, timeRecord{time.Now(), "Before creation"})
go threadMain(done)
for j := 0; j < 1000; j++ {
dummy += j
}
<-done
}
Under this variation, the same time delta takes roughly 890 ns.
Obviously, the exact numbers are machine-specific, but the difference between the numbers is curious. Logically, if I am measuring between "Before creation" and "Inside thread", adding extra logic after the go
statement seems like it should not increase that time, but it does.
Does anyone have a good idea for why the time increase is not occurring in the expected location?
Go scheduler is cooperative. It can switch the current goroutine to another one only at certain points of the program such as function calls, reading/writing to a channel, etc. I expect that the difference you observe is due to the fact that the goroutines are started later (after the added for loop).