What is the equivalent to this Perl program, in Go?
use utf8;
my $bin = "..."; # the data may came from file, network, etc
utf8::decode( $bin ); # Encode::decode( 'UTF-8', $bin ) also works
defined in http://perldoc.perl.org/utf8.html
I try several forms of transformation/ normalization/ convert to bytes/runes/etc without success.
Context:
I am using Sereal ( https://github.com/Sereal/Sereal ) to serialize some data structures in a NoSQL database, first we had a Perl 5.12.2 version, them we are using Go. Sereal format is a binary
However, somehow, some perl programs made an extra JSON encode/decode on the final BLOB (in a UTF-8 string), so it is fine for perl programs read and write data, but when Go try to decode it does not work.
Of course, the best solution is stop this double encoding, but since we have entries in our database with this format I want to see if I can try to decode it
Proof-of-Concept - here is my Json with an utf-8 converted string from my sereal bytes
$ hexdump -C min.bin
00000000 22 3d c3 b3 72 6c 5c 75 30 30 30 33 5c 75 30 30 |"=..rl\u0003\u00|
00000010 30 30 51 6b 6c 61 73 74 5f 75 70 64 61 74 65 20 |00Qklast_update |
00000020 c2 82 c3 9a c3 a6 c3 93 5c 75 30 30 30 35 22 |........\u0005"|
0000002f
decode.pl
use strict;
use warnings;
use JSON;
use Sereal::Decoder;
use Data::Dumper;
use File::Slurp;
my $data = read_file( "min.bin", { binmode => ':raw' } );
my $bin = JSON->new->allow_nonref(1)->decode( $data );
utf8::decode( $bin ); # MAGIC !!!
my $out = Sereal::Decoder->new->decode( $bin );
print(Dumper( $out ));
output
$VAR1 = {
'last_update' => 1517923586
};
decode.go
package main
import (
"encoding/json"
"fmt"
"io/ioutil"
"github.com/Sereal/Sereal/Go/sereal"
"github.com/davecgh/go-spew/spew"
)
func main() {
data, err := ioutil.ReadFile("min.bin")
if err != nil {
fmt.Printf("unexpected error while reading fixtures file: '%+v'
", err)
}
var v interface{}
err = json.Unmarshal(data, &v)
spew.Dump(v, err)
var vbody interface{}
instance := sereal.NewDecoder()
instance.PerlCompat = false
str := v.(string)
// something here to be able to decode?
err = instance.Unmarshal([]byte(str), vbody)
spew.Dump(vbody, err)
}
output
(string) (len=30) "=órl\x03\x00Qklast_update \u0082ÚæÓ\x05"
(interface {}) <nil>
(interface {}) <nil>
(*errors.errorString)(0xc42000e370)(bad header: it seems your document was accidentally UTF-8 encoded)
edit: this code works, but I think it is not the proper way:
buff := &bytes.Buffer{}
for len(s) > 0 {
r, size := utf8.DecodeRuneInString( s )
buff.WriteByte( byte(r ) )
s = s[size:]
}
rawBytes := buff.Bytes() // yeah
Regards
Tiago