I have a fundamental understanding problem about how to make sure that spawned goroutines are "closed" properly in the context of long-running processes. I watched talks regarding that topic and read about best practices. In order to understand my question please refer to the video "Advanced Go Concurrency Patterns" here
For the following, if you run code on your machine please export the environment variable GOTRACEBACK=all
so you are able to see routine states after panic.
I put the code for the original example here: naive (it does not execute on go playground, I guess bacause a time statement is used. Please copy the code and execute it locally)
The result of the panic of the naive implementation after execution is
panic: show me the stacks goroutine 1 [running]: panic(0x48a680, 0xc4201d8480) /usr/lib/go/src/runtime/panic.go:500 +0x1a1 main.main() /home/flx/workspace/go/go-rps/playground/ball-naive.go:18 +0x16b goroutine 5 [chan receive]: main.player(0x4a4ec4, 0x2, 0xc42006a060) /home/flx/workspace/go/go-rps/playground/ball-naive.go:23 +0x61 created by main.main /home/flx/workspace/go/go-rps/playground/ball-naive.go:13 +0x76 goroutine 6 [chan receive]: main.player(0x4a4ec6, 0x2, 0xc42006a060) /home/flx/workspace/go/go-rps/playground/ball-naive.go:23 +0x61 created by main.main /home/flx/workspace/go/go-rps/playground/ball-naive.go:14 +0xad exit status 2
That demonstrates the underlying problem of leaving dangling goroutines on the system, which is especially bad for long running processes.
So for my personal understanding I tried two slightly more sophisticated variants to be found here:
generator pattern with quit channel
(again, not executable on the playground, cause "process takes too long")
The first solution is not fitting for various reasons, even leading to non-determinism in executed steps, depending on goroutine execution speed.
Now I thought -- and here finally comes the question! -- that the second solution with the quit channel would be appropriate to eliminate all executional traces from the system before exiting. Anyhow, "sometimes" the program exits too fast and the panic reports an additional goroutine runnable still residing on the system. The panic output:
panic: show me the stacks goroutine 1 [running]: panic(0x48d8e0, 0xc4201e27c0) /usr/lib/go/src/runtime/panic.go:500 +0x1a1 main.main() /home/flx/workspace/go/go-rps/playground/ball-perfect.go:20 +0x1a9 goroutine 20 [runnable]: main.player.func1(0xc420070060, 0x4a8986, 0x2, 0xc420070120) /home/flx/workspace/go/go-rps/playground/ball-perfect.go:27 +0x211 created by main.player /home/flx/workspace/go/go-rps/playground/ball-perfect.go:36 +0x7f exit status 2
My question is: that should not happen, right? I do use a quit channel to cleanup state before stepping forward to panicking.
I did a final try of implementing safe cleanup behavior here: artificial wait time for runnables to close
Anyhow, that solution does not feel right and may as well not be applicable to large amounts of runnables?
What would be the recommended and most idiomatic pattern to ensure correct cleanup?
Thanks for your time
Your are fooled by the output: Your "generator pattern with quit channel" works perfectly fine, the two goroutines actually are terminated properly.
You see them in the trace because you panic too early. Remember: You have to goroutines running concurrently with main. main "stops" these goroutines by signaling on the quit channel. After these two sends on line 18 and 19 the two receives on line 32 have happened. And nothing more! You still have three goroutines running: Main is between lines 19 and 20 and the player goroutines are between lines 32 and 33. If now the panic in main happens before the return in player then the player goroutines are still there and are show in the panic stacktrace. These goroutines would have ended several milliseconds later if only the scheduler would have had time to execute the return on line 33 (which it hadn't as you killed it by panicking).
This is an instance of the "main ends to early to see concurrent goroutines do work" problem asked once a month here. You do see the concorrent goroutines doing work, but not all work. You might try sleeping 2 milliseconds before the panic and your player goroutines will have time to execute the return and everything is fine.