Example string:
"\u0410\u043b\u0435\u043a\u0441\u0430\u043d\u0434\u0440\u044b!
\u0421\u043f\u0430\u0441\u0438\u0431\u043e \ud83d\udcf8 link.ru \u0437\u0430
#hashtag Русское слово, an English word"
Without this \ud83d\udcf8
my func works well:
func convertUnicode(text string) string {
s, err := strconv.Unquote(`"` + text + `"`)
if err != nil {
// Error.Printf("can't convert: %s | err: %s
", text, err)
return text
}
return s
}
My question is how to detect that text contains this kind of entries? And how to convert it to emoji or how to remove from the text? Thanks
Well, probably not so simple as neither \ud83d
nor \udcf8
are valid code points but together are a surrogate pair used in UTF-16 encoding to encode \U0001F4F8
. Now strconv.Unquote
will give you two surrogate halves which you have to combine yourself.