What's the best practice to synchronise wait groups and channels? I want to handle messages and block on a loop, and it appears that delegating the closing of the channel to another go routine seems to be a weird solution?
func Crawl(url string, depth int, fetcher Fetcher) {
ch := make(chan string)
var waitGroup sync.WaitGroup
waitGroup.Add(1)
go crawlTask(&waitGroup, ch, url, depth, fetcher)
go func() {
waitGroup.Wait()
close(ch)
}()
for message := range ch {
// I want to handle the messages here
fmt.Println(message)
}
}
func crawlTask(waitGroup *sync.WaitGroup, ch chan string, url string, depth int, fetcher Fetcher) {
defer waitGroup.Done()
if depth <= 0 {
return
}
body, urls, err := fetcher.Fetch(url)
if err != nil {
ch <- err.Error()
return
}
ch <- fmt.Sprintf("found: %s %q
", url, body)
for _, u := range urls {
waitGroup.Add(1)
go crawlTask(waitGroup, ch, u, depth-1, fetcher)
}
}
func main() {
Crawl("http://golang.org/", 4, fetcher)
}
// truncated from https://tour.golang.org/concurrency/10 webCrawler
As an alternative to using a waitgroup and extra goroutine, you can you a separate channel for ending goroutines.
This is (also) idiomatic in Go. It involves blocking using a select
control group.
So you'd have to make
a new channel, typically with an empty struct as it's value (eg closeChan := make(chan struct{})
which, when closed (close(closeChan)
) would end the goroutine itself.
Instead of ranging over a chan
, you can use a select to block until either fed data or closed.
The code in Crawl
could look something like this:
for { // instead of ranging over a to-be closed chan
select {
case message := <-ch:
// handle message
case <-closeChan:
break // exit goroutine, can use return instead
}
}
And then in crawlTask
, you could close the closeChan
(passed in as another parameter, like ch
when you return (I figure that's when you want the other goroutine to end, and stop handling messages?)
if depth <= 0 {
close(closeChan)
return
}
Using a separate 'closer' go-routine prevents a deadlock.
If the wait/close operation were in the main go-routine before the for-range loop, it would never end, because all of the 'worker' go-routines would block in the absence of a receiver on the channel. And if it were placed in the main go-routine after the for-range loop, it would be unreachable, because the loop would block with no-one to close the channel.
This explanation was borrowed from 'The Go Programming Language' book (8.5 Looping in parallel).