使用Go解码文本时会忽略非法字节吗？

I'm converting a Go program that decodes email messages. It currently runs iconv to do the actual decoding, which of course has overhead. I would like to use the golang.org/x/text/transform and golang.org/x/net/html/charset packages to do this. Here is working code:

// cs is the charset that the email body is encoded with, pulled from
// the Content-Type declaration.
enc, name := charset.Lookup(cs)
if enc == nil {
    log.Fatalf("Can't find %s", cs)
}
// body is the email body we're converting to utf-8
r := transform.NewReader(strings.NewReader(body), enc.NewDecoder())

// result contains the converted-to-utf8 email body
result, err := ioutil.ReadAll(r)

That works great except for when it encounters illegal bytes, which unfortunately is not an uncommon experience when dealing with email in the wild. ioutil.ReadAll() returns an error and all the converted bytes up until the problem. Is there a way to tell the transform package to ignore illegal bytes? Right now, we use the -c flag to iconv to do that. I've gone through the docs for the transform package, and I can't tell if it's possible or not.

UPDATE: Here's a test program that shows the problem (the Go playground doesn't have the charset or transform packages...). The raw text was taken from an actual email. Yep, it's in English, and yep, the charset in the email was set to EUC-KR. I need it to ignore that apostrophe.

package main

import (
    "io/ioutil"
    "log"
    "strings"

    "golang.org/x/net/html/charset"
    "golang.org/x/text/transform"
)

func main() {
    raw := `So, at 64 kBps, or kilobits per second, you’re getting 8 kilobytes a second.`
    enc, _ := charset.Lookup("euc-kr")
    r := transform.NewReader(strings.NewReader(raw), enc.NewDecoder())
    result, err := ioutil.ReadAll(r)
    if err != nil {
        log.Printf("ReadAll returned %s", err)
    }
    log.Printf("RESULT: '%s'", string(result))
}

Here is the solution I went with. Instead of using a Reader, I allocate the destination buffer by hand and call the Transform() function directly. When Transform() errors out, I check for a short destination buffer, and reallocate if necessary. Otherwise I skip a rune, assuming that it is the illegal character. For completeness, I should also check for a short input buffer, but I do not do so in this example.

raw := `So, at 64 kBps, or kilobits per second, you’re getting 8 kilobytes a second.`
enc, _ := charset.Lookup("euc-kr")
dst := make([]byte, len(raw))
d := enc.NewDecoder()

var (
    in  int
    out int
)
for in < len(raw) {
    // Do the transformation
    ndst, nsrc, err := d.Transform(dst[out:], []byte(raw[in:]), true)
    in += nsrc
    out += ndst
    if err == nil {
        // Completed transformation
        break
    }
    if err == transform.ErrShortDst {
        // Our output buffer is too small, so we need to grow it
        log.Printf("Short")
        t := make([]byte, (cap(dst)+1)*2)
        copy(t, dst)
        dst = t
        continue
    }
    // We're here because of at least one illegal character. Skip over the current rune
    // and try again.
    _, width := utf8.DecodeRuneInString(raw[in:])
    in += width
}

enc.NewDecoder() results in a transform.Transformer. The doc of NewDecoder() says:

Transforming source bytes that are not of that encoding will not result in an error per se. Each byte that cannot be transcoded will be represented in the output by the UTF-8 encoding of '\uFFFD', the replacement rune.

This tells us it is the reader failing on the replacement rune (also known as the error rune). Fortunately it is easy to strip those out.

golang.org/x/text/transform provides two helper functions we can use to solve this problem. Chain() takes a set of transformers and chains them together. RemoveFunc() takes a function and filters out all bytes for which it returns true.

Something like the following (untested) should work:

filter := transform.Chain(enc.NewDecoder(), transform.RemoveFunc(func (r rune) bool {
    return r == utf8.RuneError
}))
r := transform.NewReader(strings.NewReader(body), filter)

That should filter out all rune-errors before they get to the reader and blow up.