相当于Perl 5 utf :: decode

What is the equivalent to this Perl program, in Go?

use utf8;
my $bin = "..."; # the data may came from file, network, etc
utf8::decode( $bin ); # Encode::decode( 'UTF-8', $bin ) also works

defined in http://perldoc.perl.org/utf8.html

I try several forms of transformation/ normalization/ convert to bytes/runes/etc without success.

Context:

I am using Sereal ( https://github.com/Sereal/Sereal ) to serialize some data structures in a NoSQL database, first we had a Perl 5.12.2 version, them we are using Go. Sereal format is a binary

However, somehow, some perl programs made an extra JSON encode/decode on the final BLOB (in a UTF-8 string), so it is fine for perl programs read and write data, but when Go try to decode it does not work.

Of course, the best solution is stop this double encoding, but since we have entries in our database with this format I want to see if I can try to decode it

Proof-of-Concept - here is my Json with an utf-8 converted string from my sereal bytes

$ hexdump -C min.bin 
00000000  22 3d c3 b3 72 6c 5c 75  30 30 30 33 5c 75 30 30  |"=..rl\u0003\u00|
00000010  30 30 51 6b 6c 61 73 74  5f 75 70 64 61 74 65 20  |00Qklast_update |
00000020  c2 82 c3 9a c3 a6 c3 93  5c 75 30 30 30 35 22     |........\u0005"|
0000002f

decode.pl

use strict;
use warnings;
use JSON;
use Sereal::Decoder;
use Data::Dumper;
use File::Slurp;

my $data = read_file( "min.bin", { binmode => ':raw' } );
my $bin = JSON->new->allow_nonref(1)->decode( $data );
utf8::decode( $bin ); # MAGIC !!!
my $out = Sereal::Decoder->new->decode( $bin );
print(Dumper( $out ));

output

$VAR1 = {
          'last_update' => 1517923586
        };

decode.go

package main

import (
    "encoding/json"
    "fmt"
    "io/ioutil"

    "github.com/Sereal/Sereal/Go/sereal"
    "github.com/davecgh/go-spew/spew"
)

func main() {
    data, err := ioutil.ReadFile("min.bin")
    if err != nil {
        fmt.Printf("unexpected error while reading fixtures file: '%+v'
", err)
    }
    var v interface{}
    err = json.Unmarshal(data, &v)
    spew.Dump(v, err)

    var vbody interface{}
    instance := sereal.NewDecoder()
    instance.PerlCompat = false
    str := v.(string)
    // something here to be able to decode?
    err = instance.Unmarshal([]byte(str), vbody)
    spew.Dump(vbody, err)
}

output

(string) (len=30) "=órl\x03\x00Qklast_update \u0082ÚæÓ\x05"
(interface {}) <nil>
(interface {}) <nil>
(*errors.errorString)(0xc42000e370)(bad header: it seems your document was accidentally UTF-8 encoded)

edit: this code works, but I think it is not the proper way:

  buff := &bytes.Buffer{}
  for len(s) > 0 {
     r, size := utf8.DecodeRuneInString( s )
     buff.WriteByte( byte(r ) ) 
     s = s[size:]
  }

  rawBytes := buff.Bytes() // yeah

Regards

Tiago