My understanding from reading the documentation was that string
is essentially an immutable []byte
and that one can easily convert between the two.
However when unmarshaling from JSON this doesn't seem to be true. Take the following example program:
package main
import (
"encoding/json"
"fmt"
)
type STHRaw struct {
Hash []byte `json:"hash"`
}
type STHString struct {
Hash string `json:"hash"`
}
func main() {
bytes := []byte(`{"hash": "nuyHN9wx4lZL2L3Ir3dhZpmggTQEIHEZcC3DUNCtQsk="}`)
stringHead := new(STHString)
if err := json.Unmarshal(bytes, &stringHead); err != nil {
return
}
rawHead := new(STHRaw)
if err := json.Unmarshal(bytes, &rawHead); err != nil {
return
}
fmt.Printf("String:\t\t%x
", stringHead.Hash)
fmt.Printf("Raw:\t\t%x
", rawHead.Hash)
fmt.Printf("Raw to string:\t%x
", string(rawHead.Hash[:]))
}
This gives the following output:
String: 6e7579484e397778346c5a4c324c3349723364685a706d67675451454948455a63433344554e437451736b3d
Raw: 9eec8737dc31e2564bd8bdc8af77616699a0813404207119702dc350d0ad42c9
Raw to string: 9eec8737dc31e2564bd8bdc8af77616699a0813404207119702dc350d0ad42c9
Instead I would have expected to receive the same value each time.
What is the difference?
The designers of the encoding/json package made the decision that applications must provide valid UTF-8 text in string
values and that applications can put arbitrary byte sequences in []byte
values. The package base64 encodes []byte
values to ensure that the resulting string is valid UTF-8.
The encoding of []byte
values is described in the Marshal function documentation.
This decision was not dictated by the design of the Go language. The string
type can contain arbitrary byte sequences. The []byte
type can contain valid UTF-8 text.
The designers could have used a flag in the field tag to indicate that a string
or []byte
value should be encoded and which encoder to use, but that's not what they did.