我的hashmap实现的性能改进

I decided i'd try make my own hashmap (here)

For reads it's 28% slower than the standard library implementation, I'm wondering if it's possible to speed up the following code, Index() which is critical for lookups:

const numOnes = uint8(20)
const ones = uint32(1 << numOnes - 1)

func (m *Map) Index(num uint64) uint32 {
    part := uint32(num) & ones
    remaining := num >> numOnes
    start := m.starts[part]
    bitsNum := m.bitNums[part]
    matchedBits := bitsNum & uint16(remaining)
    offset := BitScoreCache[bitsNum][matchedBits]
    return start + uint32(offset)
}

note BitScoreCache is var BitScoreCache [5000][5000]uint16 which is supposed to be readonly and be shared between multiple different map instances.

example usage:

func (pa PrimeAdvancedAnagrammar) GetAnagrams(word string) []string {
    return pa.m[pa.locator.Index(PrimeProduct(word))] //pa.m is an [][]string
}

versus standard library:

func (pba PrimeBasicAnagrammar) GetAnagrams(word string) []string {
    return pba.m[PrimeProduct(word)] //pba.m is a map[uint64][]string
}

What are the main reasons it's slower than the standard library for so few operations?

Combining the two arrays into one array of structs reduced cache misses and was the biggest performance improvement.

also returning early on the case that there is no collisions improved performance.