发生致命错误:所有goroutine都在睡眠中-死锁! 一个简单的测试方案

I am trying to repro an issue and came to a minimum use case with the following code. If I close all the channels (bypassing the i == 0 test), things are working as expected. Wg state decrements and done is triggered, main exits fine. When I skip closing one of these channel (on purpose), I expect the main routine to wait while the waitgroup semaphore will block indefinitely in this case. Instead, I am getting an error: "fatal error: all goroutines are asleep - deadlock!". Why is that? I must have missed something fundamental or this the runtime being overzealous?

package main

import (
    "fmt"
    "sync"
)

const N int = 4

func main() {

    done := make(chan struct{})
    defer close(done)

    fmt.Println("Beginning...")

    chans := make([]chan int, N)
    var wg sync.WaitGroup

    for i := 0; i < N; i++ {
        wg.Add(1)
        chans[i] = make(chan int)
        go func(i int) { // p0
            defer wg.Done()
            for m := range chans[i] {
                fmt.Println("Received ", m)
            }
            fmt.Println("Ending p", i)
        }(i)
    }

    go func() {
        wg.Wait()
        done <- struct{}{} // signal main that we are done
    }()

    for i := 0; i < N; i++ {
        fmt.Println("Closing c", i)
        if i != 0 { // Skip #0 so wg doesn't reach '0'
            close(chans[i])
        }
    }

    <-done // wait to receive signal from anonymous join function
    fmt.Println("Ending.")
}

UPDATE: I edited the code to avoid the race condition. Still getting this error.

The if i != 0 is there because it's intentional. I want the wg.Wait to block forever (with its semaphore never reaching 0.) Why can't I do that? It seems the same as if I were using <-done without a matching done <- struct{}{} somewhere else. Would the compiler complain too in that case?

Here's what's going on:

  • The first go func(i int) { goroutine does not exit because chans[0] is not closed.
  • Because the goroutine does not exit, wg.Done is not called.
  • The call to wg.Wait() blocks forever because of the previous point.
  • Main blocks forever because the signal is not sent to done.

You can fix the deadlock by removing the if i != 0 {, but there is another issue. There is a race on the wait group. It's possible that wg.Done() is called before wg.Add(1) is called. Call wg.Add() before starting the goroutine to avoid the race.

The if statement in your for loop doesn't let the last channel close, so your goroutine is left waiting on something to happen to chans[i] which will block the defer wg.Done() from ever happening which in turn will never let wg.Wait() finish WHICH THENNNNN will never let done <- struct{}{} get signalled

So in short, your if statement in your loop is not closing the last channel and causing a deadlock because nobody can do nothing.

As @CodingPickle did point out, move your wg.Add(1) to the beginning of your for loop to prevent any race conditions

http://play.golang.org/p/j1D5LZGUhd