重用io.Reader和捕获长度

I'm consuming a remote API which returns an io.ReadCloser (so no way of getting the length without a read) and need to add a header to the response that includes the data length from the original io.Reader, then write the resulting io.Reader to an io.Writer.

Except for the omission of error-handling, the following functions are functionally correct. The addHeaderBytesBuffer function however, results in 7 allocations and the addHeaderTeeReader function results in 8:

func addHeaderBytesBuffer(in io.Reader) (out io.Reader) {
    buf := new(bytes.Buffer)
    io.Copy(buf, in)

    header := createHeader(buf.Len())
    return io.MultiReader(header, buf)
}

func addHeaderTeeReader(in io.Reader) (out io.Reader) {
    buf := new(bytes.Buffer)
    n, _ := io.Copy(ioutil.Discard, io.TeeReader(in, buf))

    return io.MultiReader(createHeader(int(n)), buf)
}

func createHeader(length int) (out io.Reader) {
    return strings.NewReader(fmt.Sprintf("HEADER:%d", length))
}

I'm maintaining a sync.Pool of bytes.Buffer instances to re-use to reduce GC pressure but is there a more efficient way to do this?