如何在多核计算机上扩展Go例程的数量以实现最大吞吐量

Running on a multi-core machine, I have bunch of go routines waiting on a channel for CPU intensive tasks. What is the optimal number of Go routines I should use in order to achieve maximum throughput of #tasks/second. Should it be equivalent to the number of cores or be proportional to the number of cores, or something else?

I think you are missing a point of goroutines, they are not OS threads and you should not care about their number (until you reach something like a million goroutine). Making less or more of them barely changes the performance as Go runtime will take care of scheduling them on real OS threads.

The number of real OS threads is controled by GOMAXPROCS (you could set it with programming or as an environment variable). It's default to the number of cores on your machine.

The point is only active goroutines are scheduled on OS threads and inactive ones (like one is waiting on a socket) are not taking any CPU resource.

Since your tasks are CPU bound (rather than IO bound) you probably wouldn't benefit from having too many goroutines for sure (since they would hardly block, it may only add a switching overhead between them). If your all your tasks finish in almost the same time, then benchmarking with few numbers (of goroutines) which are multiples of GOMAXPROCS may not be a bad idea to start with.