I have a question on how to handle disconnects on a golang TCP client
I have the below function which is in a continuous loop and reconnects on every loop
func (t *TcpConnector) Connect() {
for {
if t.reconnectAttempt > t.reconnectMaxRetries {
t.onNoReconnect(t.reconnectAttempt)
return
}
if t.reconnectAttempt > 0 {
t.onReconnect()
}
c, err := net.Dial("tcp", t.Url)
if err != nil {
t.AppState.Log.Errorf("Connection to tcp server failed with error : %+v", err)
if t.autoReconnect {
t.reconnectAttempt++
continue
}
return
}
t.Log.Infof("Connection established @ %s", time.Now())
t.Conn = c
defer t.Conn.Close()
var wg sync.WaitGroup
wg.Add(1)
go t.SendHeartbeat()
go t.Write(&wg)
go t.RecieveResponse()
err = t.SendLoginPacket()
if err != nil {
appsetup.MOSLTcpServerLoginFailureAlert(t.AppState, err)
continue
}
wg.Wait()
}
}
I block this loop with var wg sync.WaitGroup
and will on continue once all the waitgroups complete.
The SendHeartbeat writes a packet every 30 seconds to the TCP server
func (t *TcpConnector) SendHeartbeat() {
ticker := time.NewTicker(30 * time.Second)
defer ticker.Stop()
for x := range ticker.C {
messageHeader, err := getMessageHeaderInBytes(129, 0, 0, t.AppState.Config.Username)
if err != nil {
t.AppState.Log.Errorf("Error Converting heartbeat request data to bytes: %+v, with seconds: %s", err, x)
return
}
t.DataChannel <- messageHeader
}
}
This sends byte data over on a channel and is recieved by the t.Write()
function. There are many other functions which write to this channel which is taken over by the Write()
function
func (t *TcpConnector) Write(wg *sync.WaitGroup) {
for {
select {
case data := <-t.DataChannel:
_, err := t.Conn.Write(data)
if err != nil {
t.AppState.Log.Errorf("Error Sending byte data to TCP server: %+v", err)
t.Conn.Close()
wg.Done()
}
}
}
}
As you can see , what i do here is that on error - i return wg.done
which unblocks the Connect()
function and this proceeds with reconnecting to the server.
Sometimes , the TCP socket gets closed and the Write function errors out. But since there are multiple goroutines invoking the Write()
function on a parallel - all of them error out and i get the below error
panic: sync: negative WaitGroup counter
I understand this is caused because the Write()
function calls Wg.Done()
multiple times on erroring out (because multiple goroutines invoke it)
Is there a better way i can handle this?