I am trying to create a regex that matches a string if it has a 3 or more repetitive characters in a row (e.g. aaaaaa, testtttttt, otttttter).
I have tried the following:
regexp.Compile("[A-Za-z0-9]{3,}")
regexp.Compile("(.){3,}")
regexp.Compile("(.)\\1{3,}")
which matches any 3 characters in a row, but not consecutive characters... Where am I going wrong?
What you're asking for cannot be done with true regular expressions, what you need are (irregular) backreferences. While many regexp engines implement them, RE2 used by Go does not. RE2 is a fast regexp engine that guarantees linear time string processing, but there's no known way to implement backreferences with such efficiency. (See https://swtch.com/~rsc/regexp/ for further information.)
To solve your problem you may want to search for some other regexp library. I believe bindings for PCRE can be found, but I've no personal experience from them.
Another approach would be to parse the string manually without using (ir)regular expressions.
Here is the ugly solution, you could automatically generate it:
A{3,}|B{3,}|...|Z{3,}|a{3,}|b{3,}|...|z{3,}|0{3,}|1{3,}|...|9{3,}
Due to the problems stated, I eventually settled on the following non-regex solution:
norm = "this it a ttttt"
repeatCount := 1
thresh := 3
lastChar := ""
for _, r := range norm {
c := string(r)
if c == lastChar {
repeatCount++
if repeatCount == thresh {
break
}
} else {
repeatCount = 1
}
lastChar = c
}