I have a large number of xml files to parse that contain unclosed tags wrapped in closed tags. Something like below:
<submission>
<first-name>Henry
<last-name>Donald
<id>4224
</submission>
I set decoder.Strict = false but it is still unable to parse the entire xml file properly.
type Submission struct {
FirstName string `xml:"first-name"`
LastName string `xml:"last-name"`
ID string `xml:"id"`
}
func main() {
dec := xml.NewDecoder(bytes.NewReader([]byte(sub)))
dec.Strict = false
dec.AutoClose = xml.HTMLAutoClose
dec.Entity = xml.HTMLEntity
var s Submission
err := dec.Decode(&s)
if err != nil {
fmt.Println(err)
}
fmt.Println(s)
}
Playground: https://play.golang.org/p/-_chEpDhzX
I know there is a html tokenizer that I may try using but I would prefer to use the XML package as the majority of the files are properly formatted.
No ways around it. You need your own decoder: http://play.golang.org/p/Kr7nq64f-c
Below worked for me, which is probably only ideal if you know the problematic tags. Although, strangely it doesn't work if I also add first-name.
dec.AutoClose = append(dec.AutoClose, "last-name")
dec.AutoClose = append(dec.AutoClose, "id")