I'm new to Go so I apologize in advance if the answer to my question is obvious :)
I'm planning a producer that reads a file and send each line to a channel, like:
scanner := bufio.NewScanner(file)
for scanner.Scan() {
processingChan <- scanner.Text()
}
and add some goroutines to consume the lines.
now, what I want is that if ANY line fails to process in a goroutine (let's say the line contains an invalid value for my business rules), I want to stop the producer loop, close the file (already defered) and finish the program.
the question is: how can I "notify" the producer loop/for to stop?
I found someone suggesting:
for scanner.Scan() {
select {
case <- quit:
// break / return
default:
// send next line to channel
}
}
and the consumer goroutines would write to a "quit" (or error) channel in case of any fault.
this approach possibly solves the question, but I wonder if there is a cleaner/better or just common/popular approach.
Correct, use the quit channel. Especially as you're already sending to the channel in the loop, handling additional one is easy. However, I wouldn't use the form you proposed, but simpler and safer version:
for scanner.Scan() {
select {
case <- quit:
return
case processingChan <- scanner.Text():
}
}
Why is it safer? Because it doesn't deadlock, contrary to your example with default
. You might be lucky and never encounter it, but there're scenarios where you will. The problem lies in the fact you have two routines talking to each other, which always needs a little bit more of attention. Consider this:
quit := make(chan error, 1)
prod := make(chan int)
go func() {
for n := range prod {
runtime.Gosched()
if n%66 == 0 {
quit <- errors.New("2/3 of evil")
return
}
}
}()
for n := 1; n < 1000; n++ {
select {
case <-quit:
fmt.Println(n)
return
default:
prod <- n
}
}
// https://play.golang.org/p/3kDRAAwaKR
Boom! Main routine is trying to send to prod
channel, but there is nobody to receive it; the same issue with our consumer.
Adding buffers to the channels won't solve the problem either, but would make it less likely.
Compare the previous example with the following change:
select {
case <-quit:
fmt.Println(n)
return
case prod <- n:
}
// https://play.golang.org/p/pz8DMYdrVV
Works nicely.
I understand one would like to use the first option to make sure they're quitting as early as possible, but it's usually not a massive issue if you send one or two additional items for processing before exiting.
I think you answered your own question. In my opinion, it is the cleanest most idiomatic approach. And I think most Gophers would agree. The other options is to share state through some variable which you'd have to wrap in a Mutex
, you'd still have a similar for loop though it would have an if at the top or bottom where it checks the flag to see if it's supposed to abort.
I think when creating consumer and producer structs the best design is to clearly name the methods that are intended to run in Go routines like ReadFilesAsync
and define the quit and/or abort channel on the same struct. It gives consumers of your classes a clean, simple, consistent thing to interact with. If the method is async, you call it within a go routine. If you want to stop it, the object you called the method on also exposes and abort channel you can signal on to do that. This removes the need for boiler plate code like declaring the abort channel in the calling scope and passing it into the asynchronous method.
EDIT: note this is more about stopping a goroutine than it is about stopping a for. That's just a matter of putting in break
or return
. The need for coordination only exists if that for loop is running in a goroutine.
I don't feel like expert, but I think your approach is good. I could imagine bit different approach with goroutine. The input of goroutines would be a channel of runes. Than ask to stop could be close an input channel and result could be result of each iteration.
But in this simple case it could be probably longer and slower than your code, therefore I thing you write a good code.