golang:如何调试可能的比赛条件

I wrote a log collector program in go, which runs a bunch of goroutines as follow:

  1. routine A runs HTTP server, allow users to view log information
  2. routine B runs UDP server, allow log messages to be sent to it from LAN
  3. routine C runs a timer, which periodically query/download zipped log archives from an internal HTTP file server (not part of the program)
  4. routine B & C both send processed messages to a Channel
  5. routine D runs a for {} loop with a select statement which receives message from the Channel and flush it to disk
  6. there are a few other go routines such as a routine to scan the log archives generated by routine D to create SQLite indices etc.

The program has a problem that after a few hours running, the log viewer http server still works well but there are NO messages coming in either from the UDP or fileserver routines. I know that there are endless log messages sending from various channels, also if I restart the program, it start to process incoming logs again.

I added -race to the compiler, and it indeed find out some problematic code, and I fixed these, but still, problem persists. What's more, although there are racy problems, the old version code running on our production server works well, regardless of the racy code.

My question is, how can I proceed to pinpoint the problem. The following is key loop in my log processing routine:

for {
    select {
    case msg := <-logCh:
        logque.Cache(msg)
    case <-time.After(time.Second):
    }
    if time.Since(lastFlush) >= 3 * time.Second {
        logque.Flush()
        lastFlush = time.Now()
    }
}

I finally found the code that created the blocking. In the following code:

for {
    select {
    case msg := <-logCh:
        logque.Cache(msg)
    case <-time.After(time.Second):
    }
    if time.Since(lastFlush) >= 3 * time.Second {
        logque.Flush()
        lastFlush = time.Now()
    }
}

Inside logque.Flush() there are some code that generate log messages which in turn write into the channel, eventually caused the channel's buffer being filled up. This only occurs when I turn on debug mode, production code does not do this in the Flush() method.

To answer my own question, the method I used to nail down the problem is pretty simple:

if len(logch) >= LOG_CHANNEL_CAP {
    //drop the message or store it into
    //secondary buffer...
    return
}
logch <- msg