I'm porting a library from Java to Go. This library passes all parameters and returns as strings, and I must maintain this way due subsequent steps. I noticed when I cast a rune/int8 array to string and I convert back to rune/int8 array I get different values. I believe that is caused by Unicode characters. Is there a way to get the same values?
package main
import "fmt"
func main() {
runes := make([]rune,3)
runes[0] = 97
runes[1] = -22
runes[2] = 99
s := string(runes)
fmt.Println(runes)
for _,r := range(s) {
fmt.Println(r)
}
}
Output:
[97 -22 99]
97
65533
99
The Go Programming Language Specification
Conversions to and from a string type
Converting a signed or unsigned integer value to a string type yields a string containing the UTF-8 representation of the integer. Values outside the range of valid Unicode code points are converted to "\uFFFD".
Converting a slice of runes to a string type yields a string that is the concatenation of the individual rune values converted to strings.
Type byte
in Go is an alias for type uint8
.
Type rune
, a Unicode code point (24-bit unsigned integer), is an alias for int32
.
Go encodes Unicode code points (rune
s) as UTF-8 encoded string
s.
For your example,
package main
import (
"fmt"
"unicode"
)
func main() {
// Unicode code points are 24-bit unsigned integers
runes := make([]rune, 3)
runes[0] = 97
runes[1] = -22 // invalid Unicode code point
runes[2] = 99
fmt.Println(runes)
// Encode Unicode code points as UTF-8
// Invalid code points converted to Unicode replacement character (U+FFFD)
s := string(runes)
fmt.Println(s)
// Decode UTF-8 as Unicode code points
for _, r := range s {
fmt.Println(r, string(r), r == unicode.ReplacementChar)
}
}
Playground: https://play.golang.org/p/AZUBd2iZp1F
Output:
[97 -22 99]
a�c
97 a false
65533 � true
99 c false
References:
The Go Programming Language Specification