从控制台以Unicode而不是golang中的UTF-8(hex)读取输入

I am trying to read a user input with bufio in console. The text can have some special characters (é, à, ♫, ╬,...).

The code look like this :

reader := bufio.NewReader(os.Stdin)
input, _ := reader.ReadString('
')

If I type for example "é", the ReadString will read it as "c3 a9" instead of "00e9". How can I read the text input in Unicode instead of UTF-8 ? I need to use this value as a hash table key.

Thanks

Unicode and utf8 are not comparable. String can be both unicode and utf8. I learned a lot of stuff about those by reading Strings, bytes, runes and characters in Go.

To answer your question,

You can use DecodeRuneInString from unicode/utf8 package.

s := "é"
rune, _ := utf8.DecodeRuneInString(s)
fmt.Printf("%x", rune)

What DecodeRuneInString(s) does is, it returns the first utf8 encoded character (rune) in s along with that characters width in bytes. So if you want to get unicode code points of each rune in a string heres how to do it. This is the example given in the linked documentation only slightly modified.

str := "Hello, 世界"

for len(str) > 0 {
    r, size := utf8.DecodeRuneInString(str)
    fmt.Printf("%x %v
", r, size)

    str = str[size:]
}

Try in Playground.

Alternatively as Juergen points out you can use a range loop on the string to get runes contained in the string.

str := "Hello, 世界"

for _, rune := range(str) {
    fmt.Printf("%x 
", rune)
}

Try in Playground

Go strings are conceptually a read-only slice to a read-only bytearray. The encoding of that bytearray is not specified, but string constants will be UTF-8 and using UTF-8 in other strings is the recommended approach.

Go provides convenience functions for accessing the UTF-8 as unicode codepoints (or runes in go-speak). A range loop over a string will do the utf8 decoding for you. Converting to []rune will give you a rune slice i.e. the unicode codepoints in order. These goodies only work on UTF-8 encoded strings/bytearrays. I would strongly suggest using UTF-8 internally.

An example:

package main

import (
  "bufio"
  "fmt"
  "os"
)

func main() {
  reader := bufio.NewReader(os.Stdin)
  input, _ := reader.ReadString('
')


  println("non-range loop - bytes")
  for i := 0; i < len(input); i++ {
    fmt.Printf("%d %d %[2]x
", i, input[i])
  }
  println("range-loop - runes")
  for idx, r := range input {
    fmt.Printf("%d %d %[2]c
", idx, r)
  }

  println("converted to rune slice")
  rs := []rune(input)
  fmt.Printf("%#v
", rs)
}

With the input: X é X

    non-range loop - bytes
    0 88 58
    1 32 20
    2 195 c3
    3 169 a9
    4 32 20
    5 88 58
    6 10 a
    range-loop - runes
    0 88 X
    1 32
    2 233 é
    4 32
    5 88 X
    6 10

    converted to rune slice
    []int32{88, 32, 233, 32, 88, 10}