高朗的地图集

If I have a struct like:

type Foo struct {
  title string
  Tags map[string]string
}

How might approach maintaining a unique set of such structs? From what I understand, although struct equality is a thing - map equality isn't. This means I can't compare my above structs. Therefore I can't just implement the map as set pattern.

The two options that might work I can think of are: convert the Tags to a sorted [][]string or use reflect.Deepequal. Anyone have a better idea?

Depending on what you are doing, one option might be to store the structs as the values in the map rather than keys. For this to work, you will need to create some way to generate a unique key from each struct value though.

Something like this might work:

// Doesn't have to be a string: just has to be suitable for use as a map key.
func (foo *Foo) key() string {
    return key_string
}

fooSet := make(map[string] *Foo)

// Store a Foo
fooSet[x.key()] = x

// Check if x is in the set:
if fooSet[x.key()] != nil {
    println("x is in the set")
}

How well this works will depend on how efficient you can derive a key for your struct.

There are a few ways of implementing this. James Henstridge actually had a good idea, and I attempted to implement it. It performed pretty poorly just to use map in the first place, without my own hash algorithm.

The way I solved this problem is just keep an array of your structs and then remove any duplicates as you insert them.

package structset

type Foo struct {
  title string
  Tags  map[string]string
}

func (f Foo) Equals(f2 Foo) bool {
  if f.title != f2.title {
    return false
  }

  if len(f.Tags) != len(f2.Tags) {
    return false
  }

  for k, v := range f.Tags {
    if w, ok := f2.Tags[k]; !ok || v != w {
      return false
    }
  }

  return true
}

type FooSet []Foo

func (this FooSet) Add(value Foo) {
  if !this.Contains(value) {
    this = append(this, value)
  }
}

func (this FooSet) Length() int {
  return len(this)
}

func (this FooSet) Contains(f Foo) bool {
  for _, v := range this {
    if v.Equals(f) {
      return true
    }
  }
  return false
}

func NewSet() FooSet {
  return FooSet(make([]Foo, 0, 100))
}

I benchmarked this on my i7-3770K Windows machine and got:

BenchmarkSmallSetWithFewCollisions         50000             46615 ns/op
BenchmarkSmallSetWithMoreCollisions        50000             46575 ns/op
BenchmarkSmallSetWithManyCollisions        50000             46605 ns/op
BenchmarkMediumSetWithFewCollisions         1000           2335296 ns/op
BenchmarkMediumSetWithMoreCollisions        1000           2352298 ns/op
BenchmarkMediumSetWithManyCollisions        1000           2336796 ns/op
BenchmarkLargeSetWithFewCollisions            50          46805944 ns/op
BenchmarkLargeSetWithMoreCollisions           50          47376016 ns/op
BenchmarkLargeSetWithManyCollisions           50          46815946 ns/op

To eek out a very small amount of performance, you can insert all your data into the array first, and then remove all duplicates after.

The remove duplicates code is:

func (this FooSet) RemoveDuplicates() {
  length := len(this) - 1
  for i := 0; i < length; i++ {
    for j := i + 1; j <= length; j++ {
      if this[i].Equals(this[j]) {
        this[j] = this[length]
        this = this[0:length]
        length--
        j--
      }
    }
  }
}

The benchmarks for this is:

BenchmarkSmallSetWithFewCollisions         50000             45245 ns/op
BenchmarkSmallSetWithMoreCollisions        50000             45615 ns/op
BenchmarkSmallSetWithManyCollisions        50000             45555 ns/op
BenchmarkMediumSetWithFewCollisions         1000           2294791 ns/op
BenchmarkMediumSetWithMoreCollisions        1000           2309293 ns/op
BenchmarkMediumSetWithManyCollisions        1000           2286290 ns/op
BenchmarkLargeSetWithFewCollisions            50          46235870 ns/op
BenchmarkLargeSetWithMoreCollisions           50          46515906 ns/op
BenchmarkLargeSetWithManyCollisions           50          45865824 ns/op

Here is the benchmark of just assigning Foo to a map[string]Foo.

BenchmarkSmallSetWithFewCollisions         50000             65718 ns/op
BenchmarkSmallSetWithMoreCollisions        50000             64238 ns/op
BenchmarkSmallSetWithManyCollisions        50000             55016 ns/op
BenchmarkMediumSetWithFewCollisions          500           3429435 ns/op
BenchmarkMediumSetWithMoreCollisions         500           3117395 ns/op
BenchmarkMediumSetWithManyCollisions        1000           2826858 ns/op
BenchmarkLargeSetWithFewCollisions            20          82635495 ns/op
BenchmarkLargeSetWithMoreCollisions           20          85285830 ns/op
BenchmarkLargeSetWithManyCollisions           20          73659350 ns/op

It seems to me even if a map was hashable, it still wouldn't perform very well.

Are you sure your example works? I believe you have to pass a pointer to the Add() method for your code to work. Anyways, here is my implementation:

package math

// types

type IntPoint struct {
    X, Y int
}

// set implementation for small number of items
type IntPointSet struct {
    slice []IntPoint 
}

// functions

func (p1 IntPoint) Equals(p2 IntPoint) bool {
    return (p1.X == p2.X) && (p1.Y == p2.Y)
}

func (set *IntPointSet) Add(p IntPoint) {
    if ! set.Contains(p) {
        set.slice = append(set.slice, p)
    }
}

func (set IntPointSet) Contains(p IntPoint) bool {
  for _, v := range set.slice {
    if v.Equals(p) {
      return true
    }
  }
  return false
}

func (set IntPointSet) NumElements() int {
    return len(set.slice)
}

func NewIntPointSet() IntPointSet {
  return IntPointSet{(make([]IntPoint, 0, 10))}
}