重新插入通道导致死锁

I have a steady inbound flow of "jobs" that I feed into an unbuffered channel. I have a for range loop to iterate over the items and process them. If processing the item fails, I re-insert the item back into the channel so I can try again later.

The problem is when I re-insert the item back into the channel - it deadlocks. I understand why it's happening: the processor isn't reading from the channel when it tries to send, thus the send blocks forever. But I can't think of a pattern to solve the problem. Can anybody assist with finding a solution?

Here is a simple example code showing my problem (https://play.golang.org/p/N_-jWL5aOCo):

package main

import (
    "fmt"
    "time"
)

type Job struct {
    ID       int
    Attempts int
}

func main() {
    ch := make(chan *Job)
    go fetchJobs(ch)

    for job := range ch {
        if success := processJob(job); !success {
            ch <- job
        }
    }
}

func processJob(job *Job) bool {
    job.Attempts++
    fmt.Printf("Processing job %+v
", job)

    // Simulate work.
    time.Sleep(time.Millisecond * 500)

    // Simulate failure on some jobs (IDs 10 to 19, 30 to 39, etc.)
    if job.ID%20 >= 10 && job.Attempts == 1 {
        return false
    }

    return true
}

func fetchJobs(ch chan *Job) {
    for i := 0; ; i++ {
        ch <- &Job{ID: i}
    }
}

The simplest solution is to use a new goroutine to put it back:

if success := processJob(job); !success {
    go func() { ch <- job }()
}

If you want to avoid using a new goroutine for this, another solution would be to have a "storage" of failed jobs. The simplest storage may be a slice. If a job processing fails, append the job to the failed jobs.

The producer before fetching a new job (or after, depends on how "fast" you want to requeue failed jobs) could check if there are failed jobs, and if so, enqueue some (or all) of those. Of course, access to this failed jobs storage must be synchronized.

Also note that you should not requeue failed jobs unconditionally, because if the error is permanent, they will never complete, potentially blocking your whole system. A simple workaround is to only requeue them if their retry counter is less than a limit.

Although if you have an unbuffered job channel and a single producer and consumer, requeueing might be an unnecessary complication. You may just as well retry a few times a failed job, and handle as undoable if it cannot succeed within some retry- or time-limit.

In this example exists infinity loop, after all will be out of memory