I'm trying to interpret a packfile received from git-upload-pack. git-upload-pack doesn't send the accompanying index, because supposedly you can derive it from the original packfile, but I can't figure out how that's possible with the packfile's format.
The git technical documentation says it has a variable number of bytes indicating the entry size, but this is the uncompressed size of the entry, and the entry data itself is compressed in the pack file with zlib. Go's zlib implementation is greedy and seeks past the end of the data with the io.Reader I give it, meaning I can't trust it to leave the io.Reader pointer at the right place after decompressing the block.
My first thought was to take a bookmark before reading the compressed block from the packfile with compress/zlib, reset to the bookmark after reading, recompress the uncompressed data with the same algorithm/compression level so that I know the length of the compressed data, and then seek forward that far to get to the right offset for the next block.
However, the recompressed data doesn't seem to be identical to the original compressed data. Why would the same data compressed with the same algorithm produce different results? And is there a better way to calculate the indexes of entries into a git packfile?
I've solved my problem in a different way: I modified compress/zlib to expose the digest from the zlib reader. After decompressing, I seek backwards in the original io.ReadSeeker to find the 4 byte digest that was used as a checksum for the compressed data so that I know where the end of the compressed data stream was.
I still don't have an answer for why git and Go's zlib algorithm would produce different results with the same compression level, though.