I'm attempting to check if the first character in a string matches the following, note the UTF-8 quote characters:
c := t.Content[0]
if c != '.' && c != ',' && c != '?' && c != '“' && c != '”'{
This code does not work due to the special characters in the last two checks.
What is the correct way to do this?
Indexing a string
indexes its bytes (in UTF-8 encoding - this is how Go stores strings in memory), but you want to test the first character.
So you should get the first rune
and not the first byte
. For efficiency you may use utf8.DecodeRuneInString()
which only decodes the first rune
. If you need all the runes of the string
, you may use type conversion like all := []rune("I'm a string")
.
See this example:
for _, s := range []string{"asdf", ".asdf", "”asdf"} {
c, _ := utf8.DecodeRuneInString(s)
if c != '.' && c != ',' && c != '?' && c != '“' && c != '”' {
fmt.Println("Ok:", s)
} else {
fmt.Println("Not ok:", s)
}
}
Output (try it on the Go Playground):
Ok: asdf
Not ok: .asdf
Not ok: ”asdf
Adding to @icza's great answer: It's worth noting that while indexing of strings is in bytes, range
of strings is in runes. So the following also works:
for _, s := range []string{"asdf", ".asdf", "”asdf"} {
for _, c := range s {
if c != '.' && c != ',' && c != '?' && c != '“' && c != '”' {
fmt.Println("Ok:", s)
} else {
fmt.Println("Not ok:", s)
}
break // we break after the first character regardless
}
}