渠道与平行主义的困惑

I'm learning myself Golang, and I'm a bit confused about parallelism and how it is implemented in Golang.

Given the following example:

package main

import (
    "fmt"
    "sync"
    "math/rand"
    "time"
)


const (
    workers = 1
    rand_count = 5000000
)


func start_rand(ch chan int) {
    defer close(ch)
    var wg sync.WaitGroup
    wg.Add(workers)
    rand_routine := func(counter int) {
        defer wg.Done()
        for i:=0;i<counter;i++ {
            seed := time.Now().UnixNano()
            rand.Seed(seed)
            ch<-rand.Intn(5000)

        }
    }
    for i:=0; i<workers; i++ {
        go rand_routine(rand_count/workers)
    }
    wg.Wait()
}

func main() {
    start_time := time.Now()
    mychan := make(chan int, workers)
    go start_rand(mychan)
    var wg sync.WaitGroup
    wg.Add(workers)

    work_handler := func() {
        defer wg.Done()
        for {
            v, isOpen := <-mychan
            if !isOpen { break }
            fmt.Println(v)
        }
    }
    for i:=0;i<workers;i++ {
        go work_handler()
    }
    wg.Wait()
    elapsed_time := time.Since(start_time)
    fmt.Println("Done",elapsed_time)
}

This piece of code takes about one minute to run on my Macbook. I assumed that increasing the "workers" constants, would launch additional go routines, and since my laptop has multiple cores, would shorten the execution time.

This is not the case however. Increasing the workers does not reduce the execution time.

I was thinking that setting workers to 1, would create 1 goroutine to generate the random numbers, and setting it to 4, would create 4 goroutines. Given the multicore nature of my laptop, I was expecting that 4 workers would run on different cores, and therefore, increae the performance. However, I see increased load on all my cores, even when workers is set to 1. What am I missing here?

Your code has some issues which makes it inherently slow:

You are seeding inside the loop. This needs only to be done once
You are using the same source for random numbers. This source is thread safe, but takes away any performance gains for concurrent workers. You could create a source for each worker with rand.New
You are printing a lot. Printing is thread safe, too. So that takes away any speed gains for concurrent workers.
As Zak already pointed out: The concurrent work inside the go routines is very cheap and the communication is expensive.

You could rewrite your program like that. Then you will see some speed gains when you change the number of workers:

package main

import (
    "fmt"
    "math/rand"
    "time"
)

const (
    workers   = 1
    randCount = 5000000
)

var results = [randCount]int{}

func randRoutine(start, counter int, c chan bool) {
    r := rand.New(rand.NewSource(time.Now().UnixNano()))
    for i := 0; i < counter; i++ {
        results[start+i] = r.Intn(5000)
    }
    c <- true
}

func main() {
    startTime := time.Now()
    c := make(chan bool)

    start := 0
    for w := 0; w < workers; w++ {
        go randRoutine(start, randCount/workers, c)
        start += randCount / workers
    }

    for i := 0; i < workers; i++ {
        <-c
    }

    elapsedTime := time.Since(startTime)
    for _, i := range results {
        fmt.Println(i)
    }
    fmt.Println("Time calulating", elapsedTime)

    elapsedTime = time.Since(startTime)
    fmt.Println("Toal time", elapsedTime)
}

This program does a lot of work in a go routine and communicates minimal. Also a different random source is used for each go routine.

Your code does not have just a single routine, even though you set the workers to 1.

There is 1 goroutine from the call go start_rand(...) That goroutine creates N (worker) routines with go rand_routine(...) and waits for them to finish.

Then you also start N (worker) go routines with go work_handler()

Then you also have 1 goroutine that was started by main() func call.

so: 1 + 2N + 1 routines running for any given N where N == workers.

Plus, on top of that, the work that you are doing in the goroutines is pretty cheap (fast to execute). You are just generating random numbers.

If you look at the blocking and scheduler latency profiles of the program:

You can see from both of the images above that most of the time is spent in the concurrency constructs. This suggests there is a lot of contention in your program. While goroutines are cheap, there is still some blocking and synchronisation that needs to be done when sending a value over a channel. This can take a large proportion of the time of the program when the work being done by the producer is very fast / cheap.

To answer your original question, you see load on many cores because you have more than a single goroutine running.