将unicode字符存储在golang中

I'm creating a data structure for storing single unicode characters which I can then compare.

Two Questions:

  1. What data types do I use?

    type ds struct { char Char // What should Char be so that I can safely compare two ds? }

  2. I need a way to compare the first character of any two unicode strings. Is there a simple way to do that? Basically, how do I retrieve the first unicode character of a string?

Like this: type Char rune.

Pay attention to "compare", that is a complicated thing with Unicode. While code points (runes) are easy to compare numerically (U+0020 == U+0020; U+1234 < U+2345) this might or might not be what you want given case, combining characters and what else Unicode offers.

  1. To compare utf8 strings, you need to check their runevalue. Runevalue is int32 value of utf8 character. Use standard package "unicode/utf8". Pass "string[0:]" to get the first character

        test := "春节"
        runeValue, width := utf8.DecodeRuneInString(test[0:])
        fmt.Println(runeValue,width)
        fmt.Printf("%#U %d", runeValue, runeValue)
    

Now you can compare runeValue of two strings's first character using == operator

  1. Also you need to store string in string if you want to store whole character.

    type ds struct {
        char string // What should Char be so that I can safely compare two ds?
    }
    

Complete code demonstrating this:

package main

import (
    "fmt"
    "unicode/utf8"
)

type ds struct {
    char string // What should Char be so that I can safely compare two ds?
}

func main() {
    fmt.Println("Hello, playground")

    ds1 := ds{"春节"}
    ds2 := ds{"春节"}

    runeValue1, _ := utf8.DecodeRuneInString(ds1.char[0:])
    runeValue2, _ := utf8.DecodeRuneInString(ds2.char[0:])

    fmt.Printf("%#U %#U", runeValue1, runeValue2)

    if runeValue1 == runeValue2 {
        fmt.Println("
First Char Same")
    } else {
        fmt.Println("
Different")
    }
}

Golang Playground

From Volkers, answer, we can just use rune to compare.

  1. type Char rune
  2. to get the first unicode character we can simply do []rune(str)[0]