由于CLOSE_WAIT和TIME_WAIT套接字数量增加,代理服务器不稳定

I have socks5 proxy server written on golang. The daemon is listening 10000 ports from 15000 to 25000 and so this is a proxy list. Recently we started to test it with some clients and we end up with about 500 rps on 5000 of these ports. This is not as much I think, but I immediately point bunch of problems.

The server is Ubuntu 18, 8 cores, 32G RAM, 1Gb network. I observe almost 800% CPU all the time and constantly rising number of CLOSE_WAIT and TIME_WAIT socket states. I investigate the code carefully about a week, but not point any problem, all connections are closing everywhere.

pprof is saying that this is all about system calls, more precisely socket read. ReadAtLeast here is reading first 3 bytes of socks5 request to determine the request type.

enter image description here

func (s *Server) Serve(conn net.Conn) error {
    defer conn.Close() // seems doesn't work too

    _ = conn.SetDeadline(time.Now().Add(time.Second * 30)) // doesn't work

    request, err := NewRequest(conn)
    if err != nil {
        return err
    }

    // Process the client request
    return s.handleRequest(request, conn)
}

func NewRequest(bufConn io.Reader) (*Request, error) {
    header := []byte{0, 0, 0}
    if _, err := io.ReadAtLeast(bufConn, header, 3); err != nil {
        return nil, fmt.Errorf("Failed to get command version: %v", err)
    }

    // ...
}

net.ipv4.tcp_fin_timeout=25 so 2MSL is 50 seconds, but seems the sockets just don't have enough time to close because new ones coming in to fast. This is about TIME_WAIT. What wrong with CLOSE_WAIT I have no idea. I definitely close the connection, but seems not getting FIN_ACK from the client.

As a temporary solution I put the restart command to the crontab each 15 minutes, so all CLOSE_WAIT connections are dropped and TIME_WAIT decreased a little, but this is a downtime etc.

enter image description here

you can try:

conn, ok = conn.(*net.TCPConn)
if ok {
     conn.SetLinger(0)
}

with this option the kernel will close TIME_WAIT and CLOSE_WAIT (terminating states) sockets quicker.

notes:

  • TIME_WAIT: your server initiated tcp termination, TIME_WAIT is the last state of that tcp conn on your server
  • CLOSE_WAIT: client initiated tcp termination, your server is waiting for FIN ACK from client