I need to create a buffered reader of an existing child io.Reader
, but that reader must support seeking in data already read and buffered from the child.
So when n
bytes were already read, I want to be able to reset the reader to offset 0
and read that chunk again.
Unfortunately bufio.Reader
doesn't support seeking.
Is there a standard reader that supports this, or do I have to implement my own?
The purpose of bufio
is to provide buffered I/O. Buffered I/O is intended for performance, not time travel.
You can just read your data into a byte slice, then use bytes.Reader
to process it further.
I wondered if seeking in the opened os.File
and then using bufio.Reset()
was an answer, and it sort of is, but not ideal. First of all, the documentation of bufio.Reset
says it "discards any buffered data", but then again, doesn't the operating system also cache recently read file contents?
Secondly, it works correctly in the sense that the bufio starts reading and buffering from the designated file position, but it doesn't consider sector alignment: it buffers a whole buffer at a time, irrespective of the starting point. Thus, assuming the buffer size (which is 4096 by default) equals the file system's cluster size, and unless the starting point is aligned to clusters, bufio will read from 2 cluster every time it needs to read. I'm not saying that the impact on performance is noticeable. In fact, by reading ahead as much as possible, performance maybe be better than what my alignment obsession might yield.
I think this code demonstrates this, reading a few 2000 byte chunks from the executable itself:
package main
import (
"bufio"
"crypto/md5"
"fmt"
"os"
)
func readBytes(r *bufio.Reader, block []byte) {
for i := 1; i < len(block); i++ {
var err error
block[i], err = r.ReadByte()
if err != nil {
panic(err)
}
}
}
func status(f *os.File, r *bufio.Reader, block []byte, what string) {
fpos, err := f.Seek(0, os.SEEK_CUR)
if err != nil {
panic(err)
}
fmt.Printf("%s: fpos=%5d, buffered=%4d, md5=%X
", what, fpos, r.Buffered(), md5.Sum(block))
}
func main() {
f, err := os.Open(os.Args[0])
if err != nil {
panic(err)
}
defer func() { f.Close() }()
r := bufio.NewReader(f)
var block = make([]byte, 2000)
status(f, r, block, "initial")
readBytes(r, block)
status(f, r, block, "block 1")
readBytes(r, block)
status(f, r, block, "block 2")
readBytes(r, block)
status(f, r, block, "block 3")
f.Seek(2000, os.SEEK_SET) // return to start of buf1a
r.Reset(f)
readBytes(r, block)
status(f, r, block, "block 2")
readBytes(r, block)
status(f, r, block, "block 3")
readBytes(r, block)
status(f, r, block, "block 4")
}
typical output (showing that after seeking, file position doesn't return to multiple of 4096):
initial: fpos= 0, buffered= 0, md5=CF40A1DE3F93B4A025409B5EFA5AA210
block 1: fpos= 4096, buffered=2096, md5=C7015DD984AB85CCCBD206BA8243647D
block 2: fpos= 4096, buffered= 96, md5=E0D75F4A6DE681316515F5CD53F0D95C
block 3: fpos= 8192, buffered=2192, md5=7961B1A889E9793344374B3022314CD0
block 2: fpos= 6096, buffered=2096, md5=E0D75F4A6DE681316515F5CD53F0D95C
block 3: fpos= 6096, buffered= 96, md5=7961B1A889E9793344374B3022314CD0
block 4: fpos=10192, buffered=2192, md5=2A2F77C23EF4651E630855D9C3AA29DE