使用不同输入数据的Goroutine执行时间

I am experimenting with goroutine for parallelizing some computation. However, the execution time of goroutine confuse me. My experiment setup is simple.

runtime.GOMAXPROCS(3)

datalen := 1000000000
data21 := make([]float64, datalen)
data22 := make([]float64, datalen)
data23 := make([]float64, datalen)

t := time.Now()
res := make(chan interface{}, dlen)

go func() {
    for i := 0; i < datalen; i++ {
        data22[i] = math.Sqrt(13)
    }
    res <- true
}()

go func() {
    for i := 0; i < datalen; i++ {
        data22[i] = math.Sqrt(13)
    }
    res <- true
}()

go func() {
    for i := 0; i < datalen; i++ {
        data22[i] = math.Sqrt(13)
    }
    res <- true
}()

for i:=0; i<3; i++ {
    <-res
}
fmt.Printf("The parallel for loop took %v to run.
", time.Since(t))

Notice that I loaded the same data in 3 goroutines, the execution time for this program is

The parallel for loop took 7.436060182s to run.

However, if I let each goroutine handle different data as follows:

runtime.GOMAXPROCS(3)

datalen := 1000000000
data21 := make([]float64, datalen)
data22 := make([]float64, datalen)
data23 := make([]float64, datalen)

t := time.Now()
res := make(chan interface{}, dlen)

go func() {
    for i := 0; i < datalen; i++ {
        data21[i] = math.Sqrt(13)
    }
    res <- true
}()

go func() {
    for i := 0; i < datalen; i++ {
        data22[i] = math.Sqrt(13)
    }
    res <- true
}()

go func() {
    for i := 0; i < datalen; i++ {
        data23[i] = math.Sqrt(13)
    }
    res <- true
}()

for i:=0; i<3; i++ {
    <-res
}
fmt.Printf("The parallel for loop took %v to run.
", time.Since(t))

The execution time for this is almost 3 times more than previous and is almost equal/worse then sequential execution without goroutine

The parallel for loop took 20.744438468s to run.

I guess maybe I use the goroutine in a wrong way. So what should be the correct way to use multiple goroutines to handle different pieces of data;

Since your example program is not performing any substantial calculation, the bottleneck is going to be the speed at which data can be written to memory. With the settings in the example, we're talking about 22 GB of writes which is not insignificant.

Given the time difference in the run time of the two examples, one likely possibility is that it isn't actually writing as much to the RAM. Given that memory writes are cached by the CPU, the execution probably looks something like this:

  1. the first goroutine writes out data to a cache line representing the start of the data22 array.
  2. the second goroutine writes out data to a cache line representing the same location. The CPU running the first goroutine notices that the write invalidates its own cached write, so throws away its changes.
  3. the third goroutine writes out data to a cache line representing the same location. The CPU running the second goroutine notices that the write invalidates its own cached write, so throws away its changes.
  4. the cache line in the third CPU is evicted and the changes are written out to RAM.

This process continues as the goroutines progress through the data22 array. Since RAM is the bottleneck and we end up writing one third as much data in this scenario, it isn't that surprising that it runs approximately 3 times as fast as the second case.

You are using enormous amounts of memory. 1000000000 * 8 = 8GB in the first example and 3 * 1000000000 * 8 = 24GB in the second example. In the second example you are probably using lots of swap space. Disk I/O is very, very slow, even on an SSD.

Change datalen := 1000000000 to datalen := 100000000, a 10-fold decrease. What are your run times now? Average at least three runs of each example. How much memory does your computer have? Are you using an SSD?