如何与许多goroutine协调关闭

Say I have a function

type Foo struct {}

func (a *Foo) Bar() {
    // some expensive work - does some calls to redis
}

which gets executed within a goroutine at some point in my app. Lots of these may be executing at any given point. Prior to application termination, I would like to ensure all remaining goroutines have finished their work.

Can I do something like this:

type Foo struct {
    wg sync.WaitGroup
}

func (a *Foo) Close() {
    a.wg.Wait()
}

func (a *Foo) Bar() {
    a.wg.Add(1)
    defer a.wg.Done()

    // some expensive work - does some calls to redis
}

Assuming here that Bar gets executed within a goroutine and many of these may be running at a given time and that Bar should not be called once Close is called and Close is called upon a sigterm or sigint.

Does this make sense?

Usually I would see the Bar function look like this:

func (a *Foo) Bar() {
    a.wg.Add(1)

    go func() {
        defer a.wg.Done()
        // some expensive work - does some calls to redis
    }()
}

Yes, WaitGroup is the right answer. You can use WaitGroup.Add at anytime that the counter is greater than zero, as per doc.

Note that calls with a positive delta that occur when the counter is zero must happen before a Wait. Calls with a negative delta, or calls with a positive delta that start when the counter is greater than zero, may happen at any time. Typically this means the calls to Add should execute before the statement creating the goroutine or other event to be waited for. If a WaitGroup is reused to wait for several independent sets of events, new Add calls must happen after all previous Wait calls have returned. See the WaitGroup example.

But one trick is that, you should always keep the counter greater than zero, before Close is called. That usually means you should call wg.Add in NewFoo (or something like that) and wg.Done in Close. And to prevent multiple calls to Done ruining the wait group, you should wrap Close into sync.Once. You may also want to prevent new Bar() from being called.

WaitGroup is one way, however, the Go team introduced the errgroup for your use case exactly. The most inconvenient part of leaf bebop's answer, is the disregard for error handling. Error handling is the reason errgroup exists. And idiomatic go code should never swallow errors.

However, keeping the signatures of your Foo struct, (except a cosmetic workerNumber)—and no error handling—my proposal looks like this:

package main

import (
    "fmt"
    "math/rand"
    "time"

    "golang.org/x/sync/errgroup"
)

type Foo struct {
    errg errgroup.Group
}

func NewFoo() *Foo {
    foo := &Foo{
        errg: errgroup.Group{},
    }
    return foo
}

func (a *Foo) Bar(workerNumber int) {
    a.errg.Go(func() error {
        select {
        // simulates the long running clals
        case <-time.After(time.Second * time.Duration(rand.Intn(10))):
            fmt.Println(fmt.Sprintf("worker %d completed its work", workerNumber))
            return nil
        }
    })
}

func (a *Foo) Close() {
    a.errg.Wait()
}

func main() {
    foo := NewFoo()

    for i := 0; i < 10; i++ {
        foo.Bar(i)
    }

    <-time.After(time.Second * 5)
    fmt.Println("Waiting for workers to complete...")
    foo.Close()
    fmt.Println("Done.")
}

The benefit here, is that if you introduce error handling in your code (you should), you only need to slightly modify this code: In short, errg.Wait() would return the first redis error, and Close() could propagate this up through the stack (to main, in this case).

Utilizing the context.Context package as well, you would also be able to immediately cancel any running redis call, if one fails. There are examples of this in the errgroup documentation.

I think waiting indefinitely for all the go routines to finish is not the right way. If one of the go routines get blocked or say it hangs due to some reason and never terminates successfully, what should happen kill the process or wait for go routines to finish ?

Instead you should wait with some timeout and kill the app irrespective of whether all the routines have finished or not.

Edit: Original ans Thanks @leaf bebop for pointing it out. I misunderstood the question.

Context package can be used to signal all the go routines to handle kill signal.

appCtx, cancel := context.WithCancel(context.Background())

Here appCtx will have to be passed to all the go routines.

On exit signal call cancel().

functions running as go routines can handle how to handle cancel context.

Using context cancellation in Go

A pattern i use a lot is: https://play.golang.org/p/ibMz36TS62z

package main

import (
    "fmt"
    "sync"
    "time"
)

type response struct {
    message string
}

func task(i int, done chan response) {
    time.Sleep(1 * time.Second)
    done <- response{fmt.Sprintf("%d done", i)}
}

func main() {

    responses := GetResponses(10)

    fmt.Println("all done", len(responses))
}

func GetResponses(n int) []response {
    donequeue := make(chan response)
    wg := sync.WaitGroup{}
    for i := 0; i < n; i++ {
        wg.Add(1)
        go func(value int) {
            defer wg.Done()
            task(value, donequeue)
        }(i)
    }
    go func() {
        wg.Wait()
        close(donequeue)
    }()
    responses := []response{}
    for result := range donequeue {
        responses = append(responses, result)
    }

    return responses
}

this makes it easy to throttle as well: https://play.golang.org/p/a4MKwJKj634

package main

import (
    "fmt"
    "sync"
    "time"
)

type response struct {
    message string
}

func task(i int, done chan response) {
    time.Sleep(1 * time.Second)
    done <- response{fmt.Sprintf("%d done", i)}
}

func main() {

    responses := GetResponses(10, 2)

    fmt.Println("all done", len(responses))
}

func GetResponses(n, concurrent int) []response {

    throttle := make(chan int, concurrent)
    for i := 0; i < concurrent; i++ {
        throttle <- i
    }
    donequeue := make(chan response)
    wg := sync.WaitGroup{}
    for i := 0; i < n; i++ {
        wg.Add(1)
        <-throttle
        go func(value int) {
            defer wg.Done()
            throttle <- 1
            task(value, donequeue)
        }(i)
    }
    go func() {
        wg.Wait()
        close(donequeue)
    }()
    responses := []response{}
    for result := range donequeue {
        responses = append(responses, result)
    }

    return responses
}