I have a binary tree that accessing a node is relatively fast, with the exception of the leaves - they might be 100-1000 times slower. I have a recursive algorithm that I would like to implement in go (I am new to it).
Because I have to get to the leaves to get benefits from the parallelism I need to parallelize the execution higher in the tree. This though might result in millions of goroutines. Limiting this with semaphore does not seem the 'go' way- there is no such sync primitive. Another concern I have is how expensive is, in fact, a channel, should I use wait group instead.
My tree is abstract and the algorithm runs over it identifying the items by level and index.
// l3 0
// / \
// l2 0 1
// / \ / \
// l1 0 1 2 3
// / \ / \ / \ / \
// l0 0 1 2 3 4 5 6 7
For example, I can use such to compute sum of all items in a vector with the function:
func Sum(level, index int, items []int) int {
if level == 0 {return items[index]}
return Sum(level-1, index*2, items) + Sum(level-1, index*2+1, items)
}
What should be my approach? Can someone point me to a recursive tree multithreaded algorithm implemented in go?
It sounds like you need a worker pool. Here's an example I just wrote: https://play.golang.org/p/NRM0yyQi8X
package main
import (
"fmt"
"sync"
"time"
)
type Leaf struct {
// Whatever
}
func worker(i int, wg *sync.WaitGroup, in <-chan Leaf) {
for leaf := range in {
time.Sleep(time.Millisecond * 500)
fmt.Printf("worker %d finished work: %#v
", i, leaf)
}
fmt.Printf("worker %d exiting
", i)
wg.Done()
}
func main() {
var jobQueue = make(chan Leaf)
var numWorkers = 10
// the waitgroup will allow us to wait for all the goroutines to finish at the end
var wg = new(sync.WaitGroup)
for i := 0; i < numWorkers; i++ {
wg.Add(1)
go worker(i, wg, jobQueue)
}
// enqueue work (this goes inside your tree traversal.)
for i := 0; i < 100; i++ {
jobQueue <- Leaf{}
}
// closing jobQueue will cause all goroutines to exit the loop on the channel.
close(jobQueue)
// Wait for all the goroutines to finish
wg.Wait()
}
I strongly suggest reading this excellent blog post from top to bottom:
https://blog.golang.org/pipelines
It covers not only an example of exactly what you need (i.e. parallelized file-tree walk calculating MD5 file checksums), but much much more:
- fan-in/fan-out channel techniques
- parallelism
- pipeline cancellations via done channels
- pipeline error chaining via error channels
- bounded parallelism
The last topic, bounded parallelism, is used to ensure 'walking' large-node directory-trees do not create excessive go-routines: bounded.go