如何获得可用TCP数据的大小?

Problem

I have a use case where I need to Peek at exactly the first TCP packet, whatever length it may be.

Snippet

I would have expected this to work:

conn, err := sock.Accept()
if nil != err {
    panic(err)
}

// plenty of time for the first packet to arrive
time.Sleep(2500 * 1000000)

bufConn := bufio.NewReader(conn)
n := bufConn.Buffered()
fmt.Fprintf(os.Stdout, "Size of Buffered Data %d
", n)

However, even though I am positive that the data has arrived it still shows that 0 bytes are buffered.

Full Test Application

Here's a full test program:

package main

import (
    "bufio"
    "fmt"
    "net"
    "os"
    "strconv"
    "time"
)

func main () {
    addr := ":" + strconv.Itoa(4080)
    sock, err := net.Listen("tcp", addr)
    if nil != err {
        panic(err)
    }
    conn, err := sock.Accept()
    if nil != err {
        panic(err)
    }

    bufConn := bufio.NewReader(conn)
    var n int
    for {
        n = bufConn.Buffered()
        fmt.Fprintf(os.Stdout, "Size of Buffered Data %d
", n)
        if 0 != n {
            break
        }
        time.Sleep(2500 * 1000000)
    }
    first, err := bufConn.Peek(n)
    if nil != err {
        panic(err)
    }
    fmt.Fprintf(os.Stdout, "[Message] %s
", first)
}

Testing

And how I've been testing:

telnet localhost 4080

Hello, World!

This works equally well:

echo "Hello, World!" | nc localhost -p 4080

However, if I call Peek(14) directly the data is obviously there.

Why?

I'm dealing with an application-specific use case - magic byte detection when multiplexing multiple protocols over a single port.

In theory packet sizes are unreliable, but in practice a small hello packet of a few bytes will not be made smaller by any routers in the path and the application will not send more data until it receives the handshake response.

The Kicker

I'm supporting exactly one protocol that requires the server to send its hello packet first, which means that if after a wait of 250ms no packet has been received, the server will assume that this special protocol is being used and send its hello.

Hence, it will be best if I can know if data exists in the underlying buffer without doing any Read() or Peek() beforehand.

Update: Can't be done with net.Conn

Actually, it is not possible to "Peek" at a net.Conn without reading. However net.Conn can be wrapped and that wrapper can be passed around anywhere net.Conn is accepted.

See

Workable Half-Solution

The ideal solution would be to be able to Peek immediately on the first try. While searching around I did find some custom go TCP libraries... but I'm not feeling adventurous enough to try that yet.

Building off of what @SteffenUllrich said, it turns out that buffConn.Peek(1) will cause the buffer to be filled with the available data. After that buffConn.Buffered() returns the expected number of bytes and it's possible to proceed with buffConn.Peek(n):

// Cause the bufConn with the available data
firstByte, err = bufConn.Peek(1)
if nil != err {
    panic(err)
}

// Check the size now
n = bufConn.Buffered()
fmt.Fprintf(os.Stdout, "Size of Buffered Data %d
", n)

// Peek the full amount of available data
firstPacket, err = bufConn.Peek(n)
if nil != err {
    panic(err)
}

I thought I had tried this earlier and saw the buffer only filled with 1 byte, but reading the answer above caused me to create a specific test case to be sure, and it worked.

The Downside

This still requires a Read()/Peek() before knowing the size of the data.

This means that for my particular case where a single protocol is supported which requires the server to send the first hello packet, I have to store state about the connection somewhere else such that if enough time has passed (say 250ms) without any data being received I know to now skip detection of the first packet Peek when it comes in.

I have a use case where I need to Peek at exactly the first TCP packet, whatever length it may be.

TCP is a streaming protocol and not a datagram protocol like UDP. This means packets are irrelevant from the perspective of TCP. They only exist temporarily on the wire.

Any data the application sends will be put into the continuous send buffer and then packetized by the operating system for transport. This means multiple writes by the application can result in a single packet, a single write into multiple packets etc. If data are lost during transport (i.e. no ACK) the senders OS can even do a retransmit with differently sized packets.

Similar packets received on the wire will be reassembled inside the OS kernel and will be put into the continuous read buffer. All packet boundaries which might have existed on the wire will be lost when doing this. Therefore no way exist for the application to find out where the packet boundary was.

    n = bufConn.Buffered()

bufConn is not the OS socket buffer. bufConn.Buffered() will only see the data which are read from the underlying socket into the Go process but which are not yet retrieved by the application logic using bufConn.Read(): if you try to read a single byte with bufConn.Read() it will actually try to read more bytes from the underlying socket, return the single byte you've requested and keep the rest in the bufConn buffer for later reads. This is done to provide a more efficient interface for the application logic. If you don't want this don't use buffered I/O.