I'm using https://github.com/google/go-gcm to send push notifications from our Go backend to Android devices. Recently, these push notifications started failing because the call to SendXmpp()
was returning with the following error:
write tcp <IP>:<port>-><IP>:<port>: write: connection timed out
Restarting the Go process that called SendXmpp()
makes this error go away, and push notifications start working again. But of course, restarting the Go process isn't ideal. Is there something I can do explicitly to handle this kind of error? For instance, should I close the current XmppClient and retry sending the message, so that the retry instantiates a new XmppClient and opens a new connection?
I would recommend using or implementing a (exponential) backoff. There are a number of options on GitHub here; https://github.com/search?utf8=%E2%9C%93&q=go+backoff though that's surely not a comprehensive list and it's not terribly difficult to implement.
The basic idea is you pass the function you'd like to call in to the back off function which calls it until it hits a max failures limit or it succeeds. Between each failure the amount of time waited is increased. If you're hammering a server, causing it to return errors, a method like this will typically solve your problems and make your application more reliable.
Additionally, I'd recommending looking for one that has an abort feature. This can be implemented fairly easily in Go by passing a channel into the backoff function (with the function you want to call). Then if your app needs to stop you can signal on the abort channel so that the back off isn't sitting there with like a 300 second wait.
Even if this doesn't resolve your specific problem it will generally have a positive effect on your apps reliability and 3rd party API's you interact with (don't want to DOS your partners).