I'm struggling to handle nested zip files in Go (where a zip file contains another zip file). I'm trying to recurse a zip file and list all of the files it contains.
archive/zip gives you two methods for handling a zip file:
OpenReader
opens a file on disk. NewReader
accepts an io.ReaderAt
and a file size. As you iterate through the zipped files with either of these, you get out a zip.File
for each file inside the zip. To get the file contents of file f, you call f.Open
which gives you a zip.ReadCloser
. To open a nested zip file, I'd need to use NewReader
, but zip.File
and zip.ReadCloser
do not satisfy the io.ReaderAt
interface.
zip.File
has a private field zipr
which is an io.ReaderAt
and zip.ReadCloser
has a private field f
which is an os.File
which should satisfy the requirements for NewReader
.
My question: is there any way to open a nested zip file without first writing the contents to a file on disk, or reading the whole thing into memory.
It looks like everything that is needed is available in zip.File, but isn't exported. I'm hoping I missed something.
How about an io.ReaderAt
from an io.Reader
that reinitializes if you decided to go backwards: (this code is largely untested, but hopefully you get the idea)
package main
import (
"io"
"io/ioutil"
"os"
"strings"
)
type inefficientReaderAt struct {
rdr io.ReadCloser
cur int64
initer func() (io.ReadCloser, error)
}
func newInefficentReaderAt(initer func() (io.ReadCloser, error)) *inefficientReaderAt {
return &inefficientReaderAt{
initer: initer,
}
}
func (r *inefficientReaderAt) Read(p []byte) (n int, err error) {
n, err = r.rdr.Read(p)
r.cur += int64(n)
return n, err
}
func (r *inefficientReaderAt) ReadAt(p []byte, off int64) (n int, err error) {
// reset on rewind
if off < r.cur || r.rdr == nil {
r.cur = 0
r.rdr, err = r.initer()
if err != nil {
return 0, err
}
}
if off > r.cur {
sz, err := io.CopyN(ioutil.Discard, r.rdr, off-r.cur)
n = int(sz)
if err != nil {
return n, err
}
}
return r.Read(p)
}
func main() {
r := newInefficentReaderAt(func() (io.ReadCloser, error) {
return ioutil.NopCloser(strings.NewReader("ABCDEFG")), nil
})
io.Copy(os.Stdout, io.NewSectionReader(r, 0, 3))
io.Copy(os.Stdout, io.NewSectionReader(r, 1, 3))
}
If you mostly move forwards this probably works ok. Especially if you use a buffered reader.
io.ReaderAt
guarantees: https://godoc.org/io#ReaderFrom , namely it doesn't allow parallel calls to ReadAt
, and doesn't block on full reads, so this may not even work properlyI ran into the exact same need and came up with the following approach, not sure if its any help to you:
// NewZipFromReader ...
func NewZipFromReader(file io.ReadCloser, size int64) (*zip.Reader, error) {
in := file.(io.Reader)
if _, ok := in.(io.ReaderAt); ok != true {
buffer, err := ioutil.ReadAll(in)
if err != nil {
return nil, err
}
in = bytes.NewReader(buffer)
size = int64(len(buffer))
}
reader, err := zip.NewReader(in.(io.ReaderAt), size)
if err != nil {
return nil, err
}
return reader, nil
}
So if file
doesn't implement io.ReaderAt
it reads the whole contents into a buffer.
It's probably not safe to handle ZIP bombs, and will defenitely fail with OOM for files larger than RAM.