Golang解码/解组JSON中的无效unicode

I am fetching JSON files in go that are not formatted homogeneously. For Example, I can have the following:

{"email": "\"blah.blah@blah.com\""}
{"email": "robert@gmail.com"}
{"name": "m\303\203ead"}

We can see that there will be a problem with the escaping character. Using json.Decode:

With:

{"name": "m\303\203ead"}

I get the error: invalid character '3' in string escape code

I have tried several approaches to normalise my data for example by passing by a string array (it works but there is too many edge cases), or even to filter escape characters.

Finally, I came through this article: (http://blog.golang.org/normalization) And the solution they proposed seemed very interesting.

I have tried the following

isMn := func(r rune) bool {
    return unicode.Is(unicode.Mn, r)
}

t := transform.Chain(norm.NFC, transform.RemoveFunc(isMn), norm.NFD)

fileReader, err := bucket.GetReader(filename)

transformReader := transform.NewReader(fileReader, t)

decoder := json.NewDecoder(tReader)

for {
    var dataModel Model
    if err := decoder.Decode(&kmData); err == io.EOF {
        break
    } else {
      // DO SOMETHING
    }
}

With Model being:

type Model struct {
    Name  string `json:"name" bson:"name"`
    Email string `json:"email" bson:"email"` 
}

I have tried several variations of it, but haven't been able to have it working.

So my question is how to easily handle decoding/unmarshaling JSON data with different encodings? Knowing, that I have no control on those JSON files.

If you are reading this, thank you anyway.

You can use json.RawMessage instead of string, that way json.Decode won't try to decode the invalid characters.

playground : http://play.golang.org/p/fB-38KGAO0

type Model struct {
    N  json.RawMessage `json:"name" bson:"name"`
}

func (m *Model) Name() string {
    return string(m.N)
}
func main() {
    s := "{\"name\": \"m\303\203ead\"}"
    r := strings.NewReader(s)
    d := json.NewDecoder(r)
    m := Model{}

    fmt.Println(d.Decode(&m))
    fmt.Println(m.Name())
}

Edit: Well, you can use regex, not sure how viable that is for you http://play.golang.org/p/VYJKTKmiYm:

func cleanUp(s string) string {
    re := regexp.MustCompile(`\b(\\\d\d\d)`)
    return re.ReplaceAllStringFunc(s, func(s string) string {
        return `\u0` + s[1:]
    })
}
func main() {
    s := "{\"name\": \"m\303\203ead\"}"
    s = cleanUp(s)
    r := strings.NewReader(s)
    d := json.NewDecoder(r)
    m := Model{}
    fmt.Println(d.Decode(&m))
    fmt.Println(m.Name())
}