I've been working on trying to draw a number of the MNIST database to an image file, in my first attempts, the image generated seemed shifted, as follows:
I know that the training file consists of 60,000 images, each one is 28 x 28 pixels big, which in the file is represented as an array of 28 x 28 x 60,000 uint8, which should give 47040000 as it's length.
However, when printing the length of the file, it gives 47040016 as it's length, that extra 16 numbers are what causes the image to shift.
The code used is the following, the const imgNum is defined by the image I want to print and the length of the image. I don't really see anything weird going on when reading the images file.
package main
import (
"image"
"image/color"
"image/png"
"io/ioutil"
"os"
)
const (
imgSideLength = 28
imgSize = imgSideLength * imgSideLength
imgNum = 499 * imgSize
)
var images []uint8
func main() {
images, err := ioutil.ReadFile("train-images")
check(err)
canvas := image.NewRGBA(image.Rect(0, 0, imgSideLength, imgSideLength))
pixelIndex := imgNum
for i := 0; i < imgSideLength; i++ {
for j := 0; j < imgSideLength; j++ {
currPixel := images[pixelIndex]
pixelIndex++
pixelColor := color.RGBA{currPixel, currPixel, currPixel, 255}
canvas.Set(j, i, pixelColor)
}
}
numFile, err := os.Create("number.png")
check(err)
defer numFile.Close()
png.Encode(numFile, canvas)
}
func check(e error) {
if e != nil {
panic(e)
}
}
Knowing that those 16 pixels are the ones that causes the image to shift, I decided to modify imgNum:
imgNum = 499 * imgSize + 16
With that change, the image draws fine.
But I would still like to know why are there an extra 16 numbers where there shouldn't be?
Looking at their web site you can see that the data format for the file is:
[offset] [type] [value] [description]
0000 32 bit integer 0x00000803(2051) magic number
0004 32 bit integer 60000 number of images
0008 32 bit integer 28 number of rows
0012 32 bit integer 28 number of columns
0016 unsigned byte ?? pixel
0017 unsigned byte ?? pixel
........
xxxx unsigned byte ?? pixel
which means the first 16 bytes are 4 32 bit integers, hence 16 byte, of header information.