在Go中输出无引号的Unicode

I'm using goyaml as a YAML beautifier. By loading and dumping a YAML file, I can source-format it. I unmarshal the data from a YAML source file into a struct, marshal those bytes, and write the bytes to an output file. But the process morphs my Unicode strings into the literal version of the quoted strings, and I don't know how to reverse it.

Example input subtitle.yaml:

line: 你好

I've stripped everything down to the smallest reproducible problem. Here's the code, using _ to catch errors which don't pop-up:

package main                                                                                                                                                                                      

import (                                                                                                                                                                                          
    "io/ioutil"                                                                                                                                                                                   
    //"unicode/utf8"                                                                                                                                                                              
    //"fmt"                                                                                                                                                                                       

    "gopkg.in/yaml.v1"                                                                                                                                                                        
)                                                                                                                                                                                                 

type Subtitle struct {                                                                                                                                                                            
    Line string                                                                                                                                                                                   
}                                                                                                                                                                                                 

func main() {                                                                                                                                                                                     
    filename := "subtitle.yaml"                                                                                                                                                                   
    in, _ := ioutil.ReadFile(filename)                                                                                                                                                            
    var subtitle Subtitle                                                                                                                                                                         
    _ = goyaml.Unmarshal(in, &subtitle)                                                                                                                                                           
    out, _ := goyaml.Marshal(&subtitle)                                                                                                                                                           

    //for len(out) > 0 { // For debugging, see what the runes are                                                                                                                                                                         
    //  r, size := utf8.DecodeRune(out)                                                                                                                                                             
    //  fmt.Printf("%c ", r)                                                                                                                                                              
    //  out = out[size:]                                                                                                                                                                            
    //}                                                                                                                                                                                           

    _ = ioutil.WriteFile(filename, out, 0644)                                                                                                                                                     
}

Actual output subtitle.yaml:

line: "\u4F60\u597D"

I want to reverse the weirdness in goyaml after I get the variable out.

The commented-out rune-printing code block, which adds spaces between runes for clarity, outputs the following. It shows that Unicode runes like aren't being decoded, but treated literally:

l i n e :   " \ u 4 F 6 0 \ u 5 9 7 D "

How can I unquote out, before writing it to the output file, so that the output looks like the input (albeit beautified)?

Desired output subtitle.yaml:

line: "你好"

Temporary Solution

I've filed https://github.com/go-yaml/yaml/issues/11. In the meantime, @bobince's tip on yaml_emitter_set_unicode was helpful in unconvering the problem. It was defined as a C binding but never called (or given an option to set it)! I changed encode.go and added yaml_emitter_set_unicode(&e.emitter, true) to line 20, and everything works as expected. It would be better to make it optional, but that would require a change in the Marshal API.

Had a similar issue and could apply this to circumvent the bug in goyaml.Marshal(). (*Regexp) ReplaceAllFunc is your friend which you can use to expand the escaped Unicode runes in the byte array. A little bit too dirty for production maybe, but works for the example ;-)

package main                                                                                                                                                                                      

import (                                                                                                                                                                                          
    "io/ioutil"                                                                                                                                                                                   
    "unicode/utf8"                                                                                                                                                                              
    "regexp"
    "strconv"
    "launchpad.net/goyaml"                                                                                                                                                                        
)                                                                                                                                                                                                 

type Subtitle struct {                                                                                                                                                                            
    Line string                                                                                                                                                                                   
}                                                                                                                                                                                                 

var reFind = regexp.MustCompile(`^\s*[^\s\:]+\:\s*".*\\u.*"\s*$`)
var reFindU = regexp.MustCompile(`\\u[0-9a-fA-F]{4}`)

func expandUnicodeInYamlLine(line []byte) []byte {
  // TODO: restrict this to the quoted string value
  return reFindU.ReplaceAllFunc(line, expandUnicodeRune)
}

func expandUnicodeRune(esc []byte) []byte {
  ri, _:= strconv.ParseInt(string(esc[2:]), 16, 32)
  r := rune(ri)
  repr := make([]byte, utf8.RuneLen(r))
  utf8.EncodeRune(repr, r)
  return repr
}

func main() {                                                                                                                                                                                     
    filename := "subtitle.yaml"
    filenameOut := "subtitleout.yaml"
    in, _ := ioutil.ReadFile(filename)                                                                                                                                                            
    var subtitle Subtitle                                                                                                                                                                         
    _ = goyaml.Unmarshal(in, &subtitle)
    out, _ := goyaml.Marshal(&subtitle)                                                                                                                                                           

    out = reFind.ReplaceAllFunc(out, expandUnicodeInYamlLine)
    _ = ioutil.WriteFile(filenameOut, out, 0644)                                                                                                                                                     
}