在Mac OS X上的Go 1.5中,无法gzip具有超过32768字节的切片

I am trying to compress byte slices in Go using compress/gzip. Whenever I compress slices with lengths longer than 2^15 on my laptop, every byte with an index of 2^15 or greater is set to 0 after decompression. When I run the same code on my research cluster it also breaks.

Calling go version on my laptop prints:

$ go version
go version go1.5 darwin/amd64

Calling go version on the cluster prints:

$ go version
go version go1.3.3 linux/amd64

Below is a demonstrative test file that I wrote. It generates random slices of different lengths, compresses them, then decompresses them. It checks that no calls returns errors and also checks that the compressed and decompressed slices are the same:

package compress

import (
    "bytes"
    "compress/gzip"
    "math/rand"
    "testing"
)

func byteSliceEq(xs, ys []byte) bool {
    if len(xs) != len(ys) { return false }
    for i := range xs {
        if xs[i] != ys[i] { return false }
    }
    return true
}

func TestGzip(t *testing.T) {
    tests := []struct {
        n int
    }{
        { 1<<10 },
        { 1<<15 },
        { 1<<15 + 1 },
        { 1<<20 },

    }

    rand.Seed(0)

    for i := range tests {
        n := tests[i].n

        in, out := make([]byte, n), make([]byte, n)
        buf := &bytes.Buffer{}
        for i := range in { in[i] = byte(rand.Intn(256)) }

        writer := gzip.NewWriter(buf)
        _, err := writer.Write(in)
        if err != nil {
            t.Errorf("%d) n = %d: writer.Write() error: %s",
                i + 1, n, err.Error())
        }
        err = writer.Close()
        if err != nil {
            t.Errorf("%d) n = %d: writer.Close() error: %s",
                i + 1, n, err.Error())
        }

        reader, err := gzip.NewReader(buf)
        if err != nil {
            t.Errorf("%d) n = %d: gzip.NewReader error: %s",
                i + 1, n, err.Error())
        }
        reader.Read(out)
        err = reader.Close()
        if err != nil {
            t.Errorf("%d) n = %d: reader.Close() error: %s",
                i + 1, n, err.Error())
        }

        if !byteSliceEq(in, out) {
            idx := -1
            for i := range in {
                if in[i] != out[i] {
                    idx = i
                    break
                }
            }
            t.Errorf("%d) n = %d: in[%d] = %d, but out[%d] = %d",
                i + 1, n, idx, in[idx], idx, out[idx])
        }
    }
}

When I run this test, I get the following output:

$ go test --run "TestGzip"
--- FAIL: TestGzip (0.12s)
    gzip_test.go:77: 3) n = 32769: in[32768] = 78, but out[32768] = 0
    gzip_test.go:77: 4) n = 1048576: in[32768] = 229, but out[32768] = 0
FAIL
exit status 1

Does anyone know what is going on here? Am I misusing the package in some way? Let me know if I haven't given enough information.

The problem is in this line:

reader.Read(out)

There is no guarantee that Reader.Read() will read the whole out slice in one step.

gzip.Reader.Read() is to implement io.Reader.Read().
Quoting from its doc (the "general contract"):

Read(p []byte) (n int, err error)

Read reads up to len(p) bytes into p.

There is no guarantee that Reader.Read() will read until out is filled, it may stop at fewer bytes if the implementation wishes so (even if EOF is not reached). If you pass a "big" slice, this may easily happen if an internal cache of the implementation is exhausted. Read() returns the number of read bytes (and an error), you may use that to check if the full slice was read.

Or even better, instead you may use io.ReadFull() to make sure out is read fully:

if _, err = io.ReadFull(reader, out); err != nil {
    t.Errorf("Error reading full out slice:", err)
}

By applying this change, your test passes.