I am writing a program to among other things identify and count unique entry in a large table file (in the order of GB compressed). My current approach constitute in recording every entry name in a large map[string]uint
and discount my count when I found a repeated entry. This works fine but when running my program allocated 8GB of RAM and I wish to reduce this footprint.
I've look into Key-Value databases to store my data, because they allow pagination into file and reduce memory footprint. Unfortunately, running some benchmarks I found most databases had even higher footprint and much slower performance. Do you have any suggestions?
BenchmarkMapRegister-4 500000 4678 ns/op 134 B/op 2 allocs/op
BenchmarkInsertBolt-4 100 23396720 ns/op 16524 B/op 50 allocs/op
BenchmarkInsertKV-4 10000 411216 ns/op
BenchmarkInsertGKVlite-4 30000 56059 ns/op 184 B/op 5 allocs/op
BenchmarkInsertBunt-4 100000 12795 ns/op 515 B/op 8 allocs/op
BenchmarkInsertBigCache-4 300000 4132 ns/op
BenchmarkInsertLevelDB-4 50000 28036 ns/op 497 B/op 10 allocs/op
PASS
P.S. I don't need to keep track of my keys, just be able to identify if it was previously recorded.