附加到[] byte,写入文件并读取后,Go出现问题

I'm trying to parse lots of IP's (~20mb or 4 million IPs), store them as bytes in a file, and read them later.

The issue I'm having is that I expect them to be stored in sorted order, but I'm seeing random byte slices which look like mangled IPs when reading them back.

// Let this be called generator.go

var buf []byte


// So this is where we build up `buf`, which we later write to a file.
func writeOut(record RecordStruct) {
    // This line is never hit. All slices have a length of 4, as expected
    if len(record.IPEnd.Bytes()) != 4 {
        fmt.Println(len(record.IPEnd.Bytes()), record.IPEnd.Bytes())
    }

    // Let's append the IP to the byte slice with a seperater of 10 null bytes which we will later call bytes.Split on.
    buf = append(buf, append(record.IPEnd.Bytes(), bytes.Repeat([]byte{0}, 10)...)...)
}

func main () {
    // Called many times. For brevity I won't include all of that logic. 
    // There are no Goroutines in the code and running with -race says all is fine.

    writeOut(...)

    err := ioutil.WriteFile("bin/test", buf, 0644)
}

reader.go

func main() {
    bytez, err := ioutil.ReadFile("bin/test")

    if err != nil {
        fmt.Println("Asset was not found.")
    }

    haystack := bytes.Split(bytez, bytes.Repeat([]byte{0}, 10))

    for _, needle := range haystack {
        // Get's hit maybe 10% of the time. The logs are below.
        if len(needle) != 4 {
            fmt.Println(fmt.Println(needle))
        }
    }
}
[188 114 235]
14 <nil>
[120 188 114 235 121]
22 <nil>
[188 148 98]
13 <nil>
[120 188 148 98 121]
21 <nil>

As you can see there are either too few or too many bits to be IPs.

And if I changed the log to better illustrate the issue, it looks like the last octet overflows?

Fine: [46 36 202 235]
Fine: [46 36 202 239]
Fine: [46 36 202 255]
Weird: [46 36 203]
Weird: [0 46 36 203 1]
Fine: [46 36 203 3]
Fine: [46 36 203 5]
Fine: [46 36 203 7]
Fine: [46 36 203 9]

The code does not split the bytes correctly when an IP address ends with a zero byte. Fix by converting the address to 16 byte representation and store 16 byte records with no delimiters.

You can efficiently append a mix of v4 and v6 addresses to the buffer using the following:

switch len(p) {
case net.IPv6len: 
    buf = append(buf, p...)
case net.IPv4len:
    buf = append(buf, v4InV6Prefix...)
    buf = append(buf, p...)
default:
    // handle error
}

where v4InV6Prefix is a package-level variable with the value []byte{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0xff, 0xff}.

Read the file as v6 addresses:

 buf, err := ioutil.ReadFile(xxx)
 if err != nil {
     // handle error
 }
 for i := 0; i < len(buf); i += 16 {
    addr := net.IP(buf[i:i+16])
    // do something with addr
 }

Note that it's also possible to read and write the file incrementally using a io.Reader and io.Writer. The code in this answer matches the code in the question where the application reads and write the file in one go.

Since you have no reserved bytes (as you've seen, the byte 0 appears in your legitimate data), you've got a couple of options:

  • If all your values are the same size, or can be made the same size, skip the delimiter and just count off the appropriate number of bytes per value.
  • Reserve a byte and escape it when you find it in your data in some way - e.g. base64 encode your values and use a 0 byte as delimiter (since 0 is not a valid base64 value).
  • Prefix each value with a byte (or some fixed number of bytes) to indicate how long the value is. e.g. you could handle IPv4 and IPv6 addresses with a single byte prefix.

The first is the simplest, and most efficient for values of all the same length. The last is the most flexible & most efficient for values of varying lengths.