I want to replace all the numbers in a string with zeros, and ideally consecutive numbers should be replaced with one zero.abc826def47
should become abc0def0
I have tried two methods:
Using regex:
var numbersRegExp = regexp.MustCompile("[0-9]+")
func normalizeNumbers(str string) string{
return numbersRegExp.ReplaceAllString(str, "0")
}
Using strings.Replace
import s "strings"
func normalizeNumbers(str string) string{
str = s.Replace(str, "1", "0", -1)
str = s.Replace(str, "2", "0", -1)
str = s.Replace(str, "3", "0", -1)
str = s.Replace(str, "4", "0", -1)
str = s.Replace(str, "5", "0", -1)
str = s.Replace(str, "6", "0", -1)
str = s.Replace(str, "7", "0", -1)
str = s.Replace(str, "8", "0", -1)
str = s.Replace(str, "9", "0", -1)
str = s.Replace(str, "00", "0", -1)
return str
}
The second method without using regex seems to be a little faster, but still very slow when working with about 100k strings, and it does not replace consecutive numbers well.
Is there a better way of doing this?
The fastest solution is (always) building output on-the-fly. This requires to loop over the runes of the input once, and with a proper initial output "buffer" (which is []rune
in this case) you can also avoid reallocation.
Here is the implementation:
func repNums(s string) string {
out := make([]rune, len(s)) // len(s) is bytes not runes, this is just estimation
i, added := 0, false
for _, r := range s {
if r >= '0' && r <= '9' {
if added {
continue
}
added, out[i] = true, '0'
} else {
added, out[i] = false, r
}
i++
}
return string(out[:i])
}
Testing it:
fmt.Printf("%q
", repNums("abc826def47")) // "abc0def0"
fmt.Printf("%q
", repNums("1234")) // "0"
fmt.Printf("%q
", repNums("asdf")) // "asdf"
fmt.Printf("%q
", repNums("")) // ""
fmt.Printf("%q
", repNums("a12b34c9d")) // "a0b0c0d"
Try it on the Go Playground.
Notes:
len(s)
which is not the rune count of the input but the bytes count. This is an upper estimation but requires no effort. You may use utf8.RuneCountInString()
to get the exact number of runes in input string
if you wish so (but this decodes and loops over the runes of the input string
, not really worth it).r >= '0' && r <= '9'
. Alternatively you may use unicode.IsDigit()
string
(which is immutable).