We can get the number of runes in a string by getting the length of the rune slice converted from the string.
s := "世界"
runes := []rune(s)
fmt.Println(len(runes))
Or use the RuneCountInString
function in unicode/utf8
package.
fmt.Println(utf8.RuneCountInString(s))
What's the difference between the two?
The difference is that the first one:
runes := []rune(s)
length := len(runes)
has to step through s
to build a slice of rune
s and then ask that slice how long it is whereas utf8.RuneCountInString
simply steps through s
byte by byte incrementing a counter whenever it sees a sequence of contiguous bytes that make up a UTF-8 character.
The []rune(s)
version has to do more work than utf8.RuneCountInString
does.
A cursory bit of wandering around the source suggests that []rune(someString)
is implemented by stringtoslicerune
which actually does two iterations over the string: one two find out how many rune
s are there and another to copy those rune
s into a slice. I'm not certain about this as I'm not that familiar with the implementation details of Go.