I have an additional question concerning my previous post Processing array in Go parallel : imagine that my arrays are very large, for example
a1 := []int{0, 1, 2, 3, 4...1000}
a2 := []int{10, 20, 30, 40, 50...10000}
and I have only 4 cpus :
runtime.GOMAXPROCS(4)
var wg sync.WaitGroup
Is the following code still correct ?
for i := 1; i < 1000; i++ {
wg.Add(1)
go func(i int) {
defer wg.Done()
x := process_array(a1[i], a2[i])
fmt.Println(a1[i], "+", a2[i], "=", x)
}(i)
}
wg.Wait()
in other words, the runtime.GOMAXPROCS(4) will be able to limit the number of threads to 4, or, there will be a problem of "accumulation" of 1000 threads ? Thanks for your comments !
Your for loop will create 1000 goroutines, runtime.GOMAXPROCS(4)
on sets the number of cpus that can be used.
GOMAXPROCS sets the maximum number of CPUs that can be executing simultaneously and returns the previous setting. If n < 1, it does not change the current setting. The number of logical CPUs on the local machine can be queried with NumCPU. This call will go away when the scheduler improves.
and on the same page:
The GOMAXPROCS variable limits the number of operating system threads that can execute user-level Go code simultaneously. There is no limit to the number of threads that can be blocked in system calls on behalf of Go code; those do not count against the GOMAXPROCS limit. This package's GOMAXPROCS function queries and changes the limit.
When writing parallel code to improve speed, always remember Amdahl's Law. His Law gives a very useful rule of thumb on when to stop bothering and can be paraphrased as 'the sequential bits will become the bottleneck'.
If you ignore Amdahl's Law, you might end up wasting your time chasing impossible objectives. Instead, you might need to think about broader issues of concurrency to solve any performance problem in more than one place or in more than one way.
Generally, the approach you are using is data-parallel: the "geometrical" decomposition of independent segments of data structures across multiple processes.
You might also consider function decompositions (essentially pipelines) where different stages do different work.
Then there is the special temporal case, using master-worker or 'data farming' as a way of achieving parallelism.
All these tend to need truly parallel hardware to be seriously useful. A good, but old, summary of multiprocessing using these techniques is in Tidmus/Chalmers Practical Parallel Processing: An introduction to problem solving in parallel (ISBN 1850321353).