许多客户端连接到Go服务器时出错

full code could download at https://groups.google.com/forum/#!topic/golang-nuts/e1Ir__Dq_gE

Could anyone help me to improve this sample code to zero bug? I think it will help us to develop a bug free client / server code.

my develop steps:

  1. Create a server which could handle multiple connections by goroutine.
  2. Build a client which works fine with simple protocol.
  3. Expand the client to simulate multiple clients (with option -n=1000 clients as default)
  4. TODO: try to reduce lock of server
  5. TODO: try to use bufio to enhance throughput

I found this code is very unstable contains with three problems:

  1. launch 1000 clients, one of them occurs a EOF when reading from server.
  2. launch 1050 clients, got too many open files soon (No any clients opened).
  3. launch 1020 clients, got runtime error with long trace stacks.

    Start pollServer: pipe: too many open files
    panic: runtime error: invalid memory address or nil pointer dereference
    
    [signal 0xb code=0x1 addr=0x28 pc=0x4650d0]
    

Here I paste my more simplified code.

const ClientCount = 1000
func main() {
    srvAddr := "127.0.0.1:10000"
    var wg sync.WaitGroup
    wg.Add(ClientCount)
    for i := 0; i < ClientCount; i++ {
        go func(i int) {
            client(i, srvAddr)
            wg.Done()
        }(i)
    }
    wg.Wait()
}
func client(i int, srvAddr string) {
    conn, e := net.Dial("tcp", srvAddr)
    if e != nil {
        log.Fatalln("Err:Dial():", e)
    }
    defer conn.Close()
    conn.SetTimeout(proto.LINK_TIMEOUT_NS)
    defer func() {
        conn.Close()
    }()

    l1 := proto.L1{uint32(i), uint16(rand.Uint32() % 10000)}
    log.Println(conn.LocalAddr(), "WL1", l1)
    e = binary.Write(conn, binary.BigEndian, &l1)
    if e == os.EOF {
        return
    }
    if e != nil {
        return
    }
    // ...
}

This answer on serverfault [1] suggests that for servers that can handle a lot of connections, setting a higher ulimit is the thing to do. Also check for application leaks of memory or file descriptor leaks using lsof.

ulimit -n 99999

[1] https://serverfault.com/a/48820/110909