如何在JSON对象流中跳过“噪声”?

Trying to get the following code to skip parse error noise in a JSON data object stream. Basically I want it to skip the ERROR: ... lines and continue onto the next parseable record.

json.Decoder has a limited set of methods - so it's unclear how to move the decoder's index forward (say a byte at a time) to move past the noise.

io.Reader has methods to skip to say the end of the line (or at least try skipping a character at time) - but doing such operations does not (understandably) affect the json.Decoder's seek state.

Is there a clean way to do this?

https://play.golang.org/p/riIDh9g1Rx

package main

import (
        "encoding/json"
        "fmt"
        "strings"
        "time"
)

type event struct {
        T    time.Time
        Desc string
}

var jsonStream = ` 
{"T":"2017-11-02T16:00:00-04:00","Desc":"window opened"}
{"T":"2017-11-02T16:30:00-04:00","Desc":"window closed"}
{"T":"2017-11-02T16:41:34-04:00","Desc":"front door opened"}
ERROR: retrieving event 1234
{"T":"2017-11-02T16:41:40-04:00","Desc":"front door closed"}
`

func main() {
        jsonReader := strings.NewReader(jsonStream)
        decodeStream := json.NewDecoder(jsonReader)

        i := 0
        for decodeStream.More() {
                i++ 
                var ev event
                if err := decodeStream.Decode(&ev); err != nil {
                        fmt.Println("parse error: %s", err)
                        break
                }   
                fmt.Printf("%3d: %+v
", i, ev) 
        }   
}

got:

  1: {T:2017-11-02 16:00:00 -0400 -0400 Desc:window opened}
  2: {T:2017-11-02 16:30:00 -0400 -0400 Desc:window closed}
  3: {T:2017-11-02 16:41:34 -0400 -0400 Desc:front door opened}
parse error: %s invalid character 'E' looking for beginning of value

want:

  1: {T:2017-11-02 16:00:00 -0400 -0400 Desc:window opened}
  2: {T:2017-11-02 16:30:00 -0400 -0400 Desc:window closed}
  3: {T:2017-11-02 16:41:34 -0400 -0400 Desc:front door opened}
  4: {T:2017-11-02 16:41:40 -0400 -0400 Desc:front door closed}

I think the "correct" way to do this, as the stream itself is not valid JSON (even without the errors, a JSON document must have a single root entry, this is a series of root objects which is not valid), would be to pre-parse into individual, valid JSON documents, and unmarshal each separately. Read the stream line-by-line using e.g. bufio.Scanner, discard the non-JSON lines, and Unmarshal the others as normal.

See working example here: https://play.golang.org/p/DZrAVmzwr-

While not very clean, you could use the Buffered method of a JSON decoder to get access to the underlying reader, which should still be pointing at the byte that caused the error, and wrap it in a buffered reader when necessary. Then you can read individual bytes until you encounter a valid JSON start-object byte { and unread that byte (at least 1 byte can be unread in any implementation) to push the byte back onto the buffered stream.

Playground link for code below

...
decodeLoop:
    for decodeStream.More() {
        i++
        var ev event
        if err := decodeStream.Decode(&ev); err != nil {
            r := decodeStream.Buffered()
            br, ok := r.(*bufio.Reader)
            if !ok {
                br = bufio.NewReader(r)
            }
            for {
                b, err := br.ReadByte()
                if err != nil {
                    // Whether EOF or not, there's nothing left to do except
                    // break the loop to trigger the "parse error" statement.
                    break
                }
                // A (potentially) valid JSON object was found;
                // create a new decoder associated with the same decodeStream var
                // using the new buffered reader and continue decoding.
                if b == '{' {
                    br.UnreadByte()
                    decodeStream = json.NewDecoder(br)
                    continue decodeLoop
                }
            }
            fmt.Println("parse error: %s", err)
            break
        }
        ...

However, this is not bulletproof as-is.

IMHO, the proper way to handle this requires that you receive a single JSON array of JSON objects, allowing you to handle manual tokenization of each JSON object that represents an event by providing an UnmarshalJSON method with an *event method receiver, but if you cannot get that, then this doesn't matter, and you'll need to modify the provided solution to make it work as necessary, assuming that is possible. One possible remedy is to set a flag and unset it when a valid JSON object is detected:

    objectDetected := false
    i := 0
decodeLoop:
    ...
                if b == '{' {
                    // If we already encountered an object and found ourselves here again,
                    // it's not really a valid JSON object.
                    if objectDetected {
                        break
                    }
                    objectDetected = true
                    br.UnreadByte()
                    ...
        fmt.Printf("%3d: %+v
", i, ev)
        objectDetected = false
    } // decode loop end
}

Playground link