每个持久连接使用HTTP内存

I'm writing a Go web server that sends Server-Sent Events to a load of clients. I'd like it to support tens of thousands of simultaneous connections. Here is my code (it just keeps the connection open and sends keep-alive events):

func handleTest(w http.ResponseWriter, r *http.Request) {
    h := w.Header()
    h.Set("Content-Type", "text/event-stream; charset=utf-8")
    h.Set("Cache-Control", "no-cache, no-store, must-revalidate")
    h.Set("Connection", "keep-alive")

    flusher := w.(http.Flusher)
    notifier := w.(http.CloseNotifier)

    flusher.Flush()

    // Just send keep-alives.
    keepAliveTime := 5 * time.Second
    keepAlive := time.NewTimer(keepAliveTime)
    defer keepAlive.Stop()

    for {
        select {
        case <-notifier.CloseNotify():
            // The connection has been closed.
            return

        case <-keepAlive.C:
            if _, err := io.WriteString(w, "event: keep-alive
data: null

"); err != nil {
                log.Println(err)
                return
            }
            flusher.Flush()
            keepAlive.Reset(keepAliveTime)
        }
    }
}

With 1000 connections Windows reports about 70 kB of RAM use per connection. If I add in all the stuff I am actually doing (there's another goroutine, and some minor event encoding functions) it balloons to 300 kB per connection. This seems like lots. With 1000 connections here is what pprof heap says:

14683.25kB of 14683.25kB total (  100%)
Dropped 12 nodes (cum <= 73.42kB)
Showing top 10 nodes out of 23 (cum >= 512.19kB)
      flat  flat%   sum%        cum   cum%
11091.50kB 75.54% 75.54% 11091.50kB 75.54%  io.copyBuffer
    2053kB 13.98% 89.52%     2053kB 13.98%  net/http.newBufioWriterSize
     514kB  3.50% 93.02%      514kB  3.50%  net/http.newBufioReader
  512.56kB  3.49% 96.51%   512.56kB  3.49%  runtime.makeslice
  512.19kB  3.49%   100%   512.19kB  3.49%  net.newFD
         0     0%   100% 11091.50kB 75.54%  io.Copy
         0     0%   100%  1540.19kB 10.49%  main.main
         0     0%   100%   512.19kB  3.49%  net.(*TCPListener).AcceptTCP
         0     0%   100%   512.19kB  3.49%  net.(*netFD).accept
         0     0%   100%   512.19kB  3.49%  net.(*netFD).acceptOne

So I have a few questions:

  1. Why is the memory use so seemingly high. I would have expected something like 10 kB per connection.
  2. Why does pprof think the heap is 14 MB, but Windows says the memory use is 70 MB? Is the rest the stack?
  3. Is there any way I can transfer control of the HTTP response to a central goroutine, and return from handleTest() without closing the connection? Would that save me memory or is the memory use all in the http.ResponseWriter object?

Edit: For 3. it looks like I can use Hijacker

Edit 2: I tried reimplementing it using Hijacker. It reduced memory usage to about 10 kB per connection, which is much more reasonable!

Why does pprof think the heap is 14 MB, but Windows says the memory use is 70 MB? Is the rest the stack?

Besides the heap, there's also the Go runtime, the stack, the code segment. Also the OS might allocate more than it's actually needed. Also, is the amount reported by Windows the resident memory or the total allocated by the OS memory?