I'm trying to optimise a puzzle with parallel processing, for better performance.
Ideally, in C99 w/ OpenMP, I should be able to do that with the help of a #pragma omp parallel for
prior to a for
loop in question, and then it should be up to the system to distribute the load between the CPUs.
The official documentation for Go at https://golang.org/doc/effective_go.html#parallel, however, seems to suggest that for parallel processing I must, (0), manually get the number of cores from the runtime environment, (1), loop over said cores, (2), effectively code up a distinct for loop for each core, (3), loop over the cores once again to make sure all the stuff got processed.
Am I missing something? For the simplest case, is OpenMP with the ancient C superior to the brand new Go that's touted as C's best replacement? For a more complicated example, how exactly do you split up a range
between the CPUs?
Effective Go is outdated about that, Go automatically sets GOMAXPROCS to the number of processors automatically (you can still manually set it to force the number you want).
For here's a very simple example for parallel processing of a slice:
data := make([]float64, SZ)
var wg sync.WaitGroup
for i := range data {
wg.Add(1)
go func(v *float64) {
// note that using rand is a bad example because global rand uses a mutex
*v = rand.Float64()
wg.Done()
}(&data[i])
}
wg.Wait()