如何使用Go和Python处理YAML中的Hex值?

I have this simple following program:

package main

import (
    "fmt"

    yaml "gopkg.in/yaml.v2"
)

type Test struct {
    SomeStringWithQuotes string `yaml:"someStringWithQuotes"`
    SomeString           string `yaml:"someString"`
    SomeHexValue         string `yaml:"someHexValue"`
}

func main() {
    t := Test{
        SomeStringWithQuotes: "\"Hello World\"",
        SomeString:           "Hello World",
        SomeHexValue:         "0xDef9C64256DeE61ebf5B212238df11C7E532e3B7",
    }
    yamlBytes, _ := yaml.Marshal(t)
    fmt.Print(string(yamlBytes))
}

This prints the following and obviously demonstrates that Go makes decisions on when to quote a string or not:

someStringWithQuotes: '"Hello World"'
someString: Hello World
someHexValue: 0xDef9C64256DeE61ebf5B212238df11C7E532e3B7

However, when I try to read this YAML using the following Python script:

import yaml

yaml_str = """
someStringWithQuotes: '"Hello World"'
someString: Hello World
someHexValue: 0xDef9C64256DeE61ebf5B212238df11C7E532e3B7
"""

print(yaml.load(yaml_str))

It parses the Hex value as an integer. If I now serialize back to YAML using this code:

import yaml
import sys

yaml_str = """
someStringWithQuotes: '"Hello World"'
someString: Hello World
someHexValue: 0xDef9C64256DeE61ebf5B212238df11C7E532e3B7
"""

print(yaml.dump(yaml.load(yaml_str)))

I get:

someHexValue: 1272966107484048169783147972546098614451903325111
someString: Hello World
someStringWithQuotes: '"Hello World"'

How can I best make sure that the Hex format is preserved? Unfortunately, I personally don't have any influence on the code on the Go side (but a Go-side solution is still welcome for other people who try to do similar things).

Go interprets that hex string as a number.

someHexValue: 0xDef9C64256DeE61ebf5B212238df11C7E532e3B7

If that is the yaml it produces then python is right to treat it as a number.

A band aid for this in python is to convert it back to hex using

hex(1272966107484048169783147972546098614451903325111)

Here is the yaml spec that treats that hex as a number

You can load and dump that output in Python while preserving the hex value using ruamel.yaml (disclaimer: I am the author of that Python package):

import sys
import ruamel.yaml

yaml_str = """\
someHexValue: 0xDef9C64256DeE61ebf5B212238df11C7E532e3B7
someString: Hello World
someStringWithQuotes: '"Hello World"'
"""

yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)

which gives:

someHexValue: 0xDEF9C64256DEE61EBF5B212238DF11C7E532E3B7
someString: Hello World
someStringWithQuotes: '"Hello World"'

The actual output of go is incorrect, if you were to output the string "0xDef9C64256DeE61ebf5B212238df11C7E532e3B7" using Python, then you will see that it outputs that string with quotes (I am using ruamel.yaml here, but this works the same for PyYAML):

import sys
import ruamel.yaml

data = dict(someHexValue="0xDef9C64256DeE61ebf5B212238df11C7E532e3B7")

yaml = ruamel.yaml.YAML()
yaml.dump(data, sys.stdout)

which gives:

someHexValue: '0xDef9C64256DeE61ebf5B212238df11C7E532e3B7'

That this string needs quoting, is determined by representing the string "plain" (i.e. without quotes) and then trying to resolving it to make sure the orgiinal type (string) is returned. This is not the case, as it is found to be an integer and the representer part of the dumping process decides that quotes are necessary. (If you ever look at the loading and dumping code and wonder why the resolver, is used by both: this is the reason the dumper needs access to the resolver.py as well).

This works the same way for a string like "True" and "2019-02-08', which also get quoted (in order not to "confuse" them with a boolean or a date).

This is a rather expensive computational process, and there are of course other ways of determining whether quotes are needed.

In go, this works in the same way, but there is an error in the relevant code in resolve.go:

        intv, err := strconv.ParseInt(plain, 0, 64)
        if err == nil {
            if intv == int64(int(intv)) {
                return yaml_INT_TAG, int(intv)
            } else {
                return yaml_INT_TAG, intv
            }
        }

From the documentation for ParseInt:

If base == 0, the base is implied by the string's prefix: base 16 for "0x", base 8 for "0", and base 10 otherwise.

The problem is of course that there is no restriction in YAML nor in Python on the size of an integer. But in go the are restricted to 64 bits. So in the above ParseInt returns an error and go thinks that the string doesn't need quoting. ( I reported this as a bug in the go-yaml library ).

The go Marshall function doesn't seem to have a flag to enforce quoting like you can do with setting yaml.default_style = '"'` inruamel.yaml``.