I am learning go channels from Go Tour example on web crawler exercise.
My understanding is
go func ()
runs the func in the background, if nothing blocks it should just finish the func and return.
But it seems like go Crawl( ) below does nothing. Am I understanding this right?
package main
import (
"fmt"
)
type Fetcher interface {
// Fetch returns the body of URL and
// a slice of URLs found on that page.
Fetch(url string) (body string, urls []string, err error)
}
// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher) {
// TODO: Fetch URLs in parallel.
// TODO: Don't fetch the same URL twice.
// This implementation doesn't do either:
if depth <= 0 {
return
}
body, urls, err := fetcher.Fetch(url)
if err != nil {
fmt.Println(err)
return
}
fmt.Printf("found: %s %q
", url, body)
for _, u := range urls {
fmt.Println("u is ", u)
go Crawl(u, depth-1, fetcher)
}
return
}
func main() {
Crawl("https://golang.org/", 4, fetcher)
}
// fakeFetcher is Fetcher that returns canned results.
type fakeFetcher map[string]*fakeResult
type fakeResult struct {
body string
urls []string
}
func (f fakeFetcher) Fetch(url string) (string, []string, error) {
if res, ok := f[url]; ok {
return res.body, res.urls, nil
}
return "", nil, fmt.Errorf("not found: %s", url)
}
A Go program terminates when the main
method of the main
package returns. Upon so doing, the program (including all goroutines) immediately exits. See the Go language spec:
Program execution begins by initializing the main package and then invoking the function main. When that function invocation returns, the program exits. It does not wait for other (non-main) goroutines to complete.
In this instance, your Crawl
method spawns several goroutines and immediately returns, without synchronising on those goroutines to wait for them to complete. Upon so doing, control flow returns to main
, which reaches the end of the function and implicitly returns, halting your program. Note that this interleaving behaviour is not deterministic – in some cases, you could get output from some goroutines, but it is very unlikely they will be scheduled for execution so promptly.
You need to implement a mechanism for Crawl
to block awaiting the results of the goroutines it spawns. There are several mechanisms for doing this, the most common and recommended being use of a WaitGroup.