I often need to get rid of duplicates based on arbitrary equals function. I need implementation that:
Can this be implemented in go? Why this function is not part of some library I am aware of? I was looking e.g. godash (github.com/zillow/godash) implementation uses map and does not allow arbitrary less and equal.
Here is how it should approximately look like. Test:
import (
"reflect"
"testing"
)
type bla struct {
ID string
}
type blas []bla
func (slice blas) Less(i, j int) bool {
return slice[i].ID < slice[j].ID
}
func (slice blas) EqualID(i, j int) bool {
return slice[i].ID == slice[j].ID
}
func Test_Unique(t *testing.T) {
input := []bla{bla{ID: "d"}, bla{ID: "a"}, bla{ID: "b"}, bla{ID: "a"}, bla{ID: "c"}, bla{ID: "c"}}
expected := []bla{bla{ID: "a"}, bla{ID: "b"}, bla{ID: "c"}, bla{ID: "d"}}
Unique(input, blas(input).Less, blas(input).EqualID)
if !reflect.DeepEqual(expected, input) {
t.Errorf("2: Expected: %v but was %v
", expected, input)
}
}
What I think will need to be used to implement this:
In the tree implementation you may put only the index in tree nodes and evaluation of nodes will be done using the Equal/Less functions defined for the interface.
Here is an example with tree, here is the play link
You have to add more functions to make it usable ,and the code is not cache friendly so you may improve the code for make it cache friendly
Set
for other set operationstype Set struct {
T Tree
Slice Setter
}
func NewSet(slice Setter) *Set {
set := new(Set)
set.T = Tree{nil, 0, nil}
set.Slice = slice
for i:=0;i < slice.Len();i++ {
insert(&set.T, slice, i)
}
return set
}
type Setter interface {
Len() int
At(int) (interface{},error)
Less(int, int) bool
Equal(int, int) bool
}
// A Tree is a binary tree with integer values.
type Tree struct {
Left *Tree
Value int
Right *Tree
}
func insert(t *Tree, Setter Setter, index int) *Tree {
if t == nil {
return &Tree{nil, index, nil}
}
if Setter.Equal(t.Value, index) {
return t
}
if Setter.Less(t.Value, index) {
t.Left = insert(t.Left, Setter, index)
return t
}
t.Right = insert(t.Right, Setter, index)
return t
}
Bloom filter is frequently used for equality test. There is https://github.com/willf/bloom for example, which awarded some stars on github. This particular implementation uses murmur3 for hashing and bitset for filter, so can be more efficient than map.