使用Golang进行HTML验证

Within my API I have a POST end point. One of the expected parameters being posted to that end point is a block of (loosely) valid HTML.

The POST will be in the format of JSON.

Within golang how can I ensure that the HTML which is posted is valid? I have been looking for something for a few days now and still haven't managed to find anything?

The term "valid" is kind of loose. I trying to ensure that tags are opened and closed, speech marks are in the right places etc.

You check that the HTML blob provided parses correctly using html.Parse from this package. For validation only, all you have to do is check for errors.

A bit late to the game, but here are a couple of parsers in Go that will work if you just want to validate the structure of the HTML (eg. you don't care if a div is inside a span, which is not allowed but is a schema level problem):

x/net/html

The golang.org/x/net/html package contains a very loose parser. Almost anything will result in valid HTML, similar to what a lot of web browsers try to do (eg. it will ignore problems with unescaped values in many cases). For example, something like <span>></span> will likely validate (I didn't check this particular one, I just made it up) as a span with the '>' character in it.

It can be used something like this:

r := strings.NewReader(`<span>></span>`)
z := html.NewTokenizer(r)
for {
    tt := z.Next()
    if tt == html.ErrorToken {
        err := z.Err()
        if err == io.EOF {
            // Not an error, we're done and it's valid!
            return nil
        }
        return err
    }
}

encoding/xml

If you need something a tiny bit more strict, but which is still okay for HTML you can configure an xml.Decoder to work with HTML (this is what I do, it lets me be a bit more flexible about how strict I want to be in any given situation):

r := strings.NewReader(`<html></html>`)
d := xml.NewDecoder(r)

// Configure the decoder for HTML; leave off strict and autoclose for XHTML
d.Strict = false
d.AutoClose = xml.HTMLAutoClose
d.Entity = xml.HTMLEntity
for {
    tt, err := d.Token()
    switch err {
    case io.EOF:
        return nil // We're done, it's valid!
    case nil:
    default:
        return err // Oops, something wasn't right
    }
}