I have an array of strings and I need to create a suffix tree out of it in Golang. SuffixArray in Golang does not suffice my needs, because it only accepts byte array (i.e of a single string). Could anybody provide pointers for implementation. Thanks in advance.
Here is an example of how to use suffix array to do auto completion. (playground).
Note that I joined all the strings together with a prefix of \x00
which can't occur in the strings first.
package main
import (
"fmt"
"index/suffixarray"
"regexp"
"strings"
)
func main() {
words := []string{
"aardvark",
"happy",
"hello",
"hero",
"he",
"hotel",
}
// use \x00 to start each string
joinedStrings := "\x00" + strings.Join(words, "\x00")
sa := suffixarray.New([]byte(joinedStrings))
// User has typed in "he"
match, err := regexp.Compile("\x00he[^\x00]*")
if err != nil {
panic(err)
}
ms := sa.FindAllIndex(match, -1)
for _, m := range ms {
start, end := m[0], m[1]
fmt.Printf("match = %q
", joinedStrings[start+1:end])
}
}
Prints
match = "hello"
match = "hero"
match = "he"
What you want is called generalized suffix tree. A simple way to build such trees is to append a different end marker(symbols not used in any of the strings) to each strings, concatenate them and build a normal suffix tree for the concatenated string. So you just need to add "hello world" to the string set and use:
match, err := regexp.Compile("[^\x00]*wor[^\x00]*")
to get the strings contain "wor". Note that the correct string is joinedStrings[start:end]
.