I am trying to compress byte slices in Go using compress/gzip
. Whenever I compress slices with lengths longer than 2^15 on my laptop, every byte with an index of 2^15 or greater is set to 0 after decompression. When I run the same code on my research cluster it also breaks.
Calling go version
on my laptop prints:
$ go version
go version go1.5 darwin/amd64
Calling go version
on the cluster prints:
$ go version
go version go1.3.3 linux/amd64
Below is a demonstrative test file that I wrote. It generates random slices of different lengths, compresses them, then decompresses them. It checks that no calls returns errors and also checks that the compressed and decompressed slices are the same:
package compress
import (
"bytes"
"compress/gzip"
"math/rand"
"testing"
)
func byteSliceEq(xs, ys []byte) bool {
if len(xs) != len(ys) { return false }
for i := range xs {
if xs[i] != ys[i] { return false }
}
return true
}
func TestGzip(t *testing.T) {
tests := []struct {
n int
}{
{ 1<<10 },
{ 1<<15 },
{ 1<<15 + 1 },
{ 1<<20 },
}
rand.Seed(0)
for i := range tests {
n := tests[i].n
in, out := make([]byte, n), make([]byte, n)
buf := &bytes.Buffer{}
for i := range in { in[i] = byte(rand.Intn(256)) }
writer := gzip.NewWriter(buf)
_, err := writer.Write(in)
if err != nil {
t.Errorf("%d) n = %d: writer.Write() error: %s",
i + 1, n, err.Error())
}
err = writer.Close()
if err != nil {
t.Errorf("%d) n = %d: writer.Close() error: %s",
i + 1, n, err.Error())
}
reader, err := gzip.NewReader(buf)
if err != nil {
t.Errorf("%d) n = %d: gzip.NewReader error: %s",
i + 1, n, err.Error())
}
reader.Read(out)
err = reader.Close()
if err != nil {
t.Errorf("%d) n = %d: reader.Close() error: %s",
i + 1, n, err.Error())
}
if !byteSliceEq(in, out) {
idx := -1
for i := range in {
if in[i] != out[i] {
idx = i
break
}
}
t.Errorf("%d) n = %d: in[%d] = %d, but out[%d] = %d",
i + 1, n, idx, in[idx], idx, out[idx])
}
}
}
When I run this test, I get the following output:
$ go test --run "TestGzip"
--- FAIL: TestGzip (0.12s)
gzip_test.go:77: 3) n = 32769: in[32768] = 78, but out[32768] = 0
gzip_test.go:77: 4) n = 1048576: in[32768] = 229, but out[32768] = 0
FAIL
exit status 1
Does anyone know what is going on here? Am I misusing the package in some way? Let me know if I haven't given enough information.
The problem is in this line:
reader.Read(out)
There is no guarantee that Reader.Read()
will read the whole out
slice in one step.
gzip.Reader.Read()
is to implement io.Reader.Read()
.
Quoting from its doc (the "general contract"):
Read(p []byte) (n int, err error)
Read reads up to len(p) bytes into p.
There is no guarantee that Reader.Read()
will read until out
is filled, it may stop at fewer bytes if the implementation wishes so (even if EOF is not reached). If you pass a "big" slice, this may easily happen if an internal cache of the implementation is exhausted. Read()
returns the number of read bytes (and an error
), you may use that to check if the full slice was read.
Or even better, instead you may use io.ReadFull()
to make sure out
is read fully:
if _, err = io.ReadFull(reader, out); err != nil {
t.Errorf("Error reading full out slice:", err)
}
By applying this change, your test passes.