While benchmarking the performance of conversion from byte arrays to uint32
s, I noticed that the conversion ran faster when starting with the least significant bits:
package blah
import (
"testing"
"encoding/binary"
"bytes"
)
func BenchmarkByteConversion(t *testing.B) {
var i uint32 = 3419234848
buf := new(bytes.Buffer)
_ = binary.Write(buf, binary.BigEndian, i)
b := buf.Bytes()
for n := 0; n < t.N; n++ {
// Start with least significant bit: 0.27 nanos
value := uint32(b[3]) | uint32(b[2])<<8 | uint32(b[2])<<16 | uint32(b[0])<<24
// Start with most significant bit: 0.68 nanos
// value := uint32(b[0])<<24 | uint32(b[1])<<16 | uint32(b[2])<<8 | uint32(b[3])
_ = value
}
}
When I run go test -bench=.
, I get 0.27 nanos per iteration when computing value
the first way, and 0.68 nanos per iteration when computing value
the second way. Why is it twice as fast to start with the least significant bits when |
ing together numbers?
There is no mystery. Optimization!
package blah
import (
"bytes"
"encoding/binary"
"testing"
)
func BenchmarkByteConversionLeast(t *testing.B) {
var i uint32 = 3419234848
buf := new(bytes.Buffer)
_ = binary.Write(buf, binary.BigEndian, i)
b := buf.Bytes()
for n := 0; n < t.N; n++ {
// Start with least significant bit: 0.27 nanos
value := uint32(b[3]) | uint32(b[2])<<8 | uint32(b[2])<<16 | uint32(b[0])<<24
_ = value
}
}
func BenchmarkByteConversionMost(t *testing.B) {
var i uint32 = 3419234848
buf := new(bytes.Buffer)
_ = binary.Write(buf, binary.BigEndian, i)
b := buf.Bytes()
for n := 0; n < t.N; n++ {
// Start with most significant bit: 0.68 nanos
value := uint32(b[0])<<24 | uint32(b[1])<<16 | uint32(b[2])<<8 | uint32(b[3])
_ = value
}
}
Output:
go test silly_test.go -bench=.
goos: linux
goarch: amd64
BenchmarkByteConversionLeast-4 2000000000 0.72 ns/op
BenchmarkByteConversionMost-4 2000000000 1.80 ns/op
It should be obvious. Bounds check elimination.
Just use common sense. If you check the array bounds for indices 3, 2, 1, and 0 you can stop checking at 3 since, obviously, 2, 1, and 0 will also be valid. If you check the array bounds for indices 0, 1, 2, and 3 you have to check the bounds for them all. One bounds check versus four bounds checks.
Wikipedia: Bounds-checking elimination
You should also be reading good code, like the Go standard library. For example,
func (littleEndian) PutUint64(b []byte, v uint64) {
_ = b[7] // early bounds check to guarantee safety of writes below
b[0] = byte(v)
b[1] = byte(v >> 8)
b[2] = byte(v >> 16)
b[3] = byte(v >> 24)
b[4] = byte(v >> 32)
b[5] = byte(v >> 40)
b[6] = byte(v >> 48)
b[7] = byte(v >> 56)
}