The following code:
package main
import "fmt"
func main() {
str := "s"
for i, v := range str {
fmt.Printf("type of s[%v]: %T
", i, str[i])
fmt.Printf("type of v: %T
", v)
}
}
yields:
type of s[0]: uint8
type of v: int32
In most languages, strings consist of signed or unsigned 8-bit characters. Why is v
int32
instead of uint8
?
The Go Programming Language Specification
For statements with range clause
For a string value, the "range" clause iterates over the Unicode code points in the string starting at byte index 0. On successive iterations, the index value will be the index of the first byte of successive UTF-8-encoded code points in the string, and the second value, of type rune, will be the value of the corresponding code point. If the iteration encounters an invalid UTF-8 sequence, the second value will be 0xFFFD, the Unicode replacement character, and the next iteration will advance a single byte in the string.
There is no inconsistency.
In Go, rune
, a Unicode code point, is an alias for int32
.
Go is not an old language that is limited to the ASCII character set. Like most [all?] modern languages, Go uses Unicode.
For example,
package main
import "fmt"
func main() {
helloworld := "Hello, 世界"
fmt.Println(helloworld)
for i, r := range helloworld {
fmt.Println(i, r, string(r))
}
}
Playground: https://play.golang.org/p/Q_iEzdlGxLu
Output:
Hello, 世界
0 72 H
1 101 e
2 108 l
3 108 l
4 111 o
5 44 ,
6 32
7 19990 世
10 30028 界