import (
"fmt"
"gopkg.in/xmlpath.v2"
"log"
)
...
path := xmlpath.MustCompile("//div[@id='23']")
tree, err := xmlpath.ParseHTML(reader)
if err != nil {
log.Fatal("HTML parsing error, maybe not wellformed", err)
}
iter := path.Iter(tree)
for iter.Next() {
fmt.Println(iter.Node().String()) // returns only the values of the text-node
}
...
Is there a way to convert iter.Node()
back to html markup like <div>...</div>
? iter.Node().String()
returns only the values of all inner text nodes. As far as I see the documentation of the xmlpath-package does not offer such function.
You are right - gopkg.in/xmlpath.v2
functions are limited to read content of nodes. And there is not many alternatives in Go to work with DOM.
From native Go libraries I can mention only goquery. It works only with HTML and does not support XPath but support CSS selectors. Maybe that would be enough in your case.
If you really need to work with both HTML and XML via XPath there is libxml wrapper for Go called gokogiri. It supports all features of libxml so you can get nodes, inner/outerHTML, attributes and other things. I used it to extract text content in one service which currently is in production state. It's a bit faster than PHP's DOMDocument. Only one limitation is fact that I'm not sure if it supports Go versions higher than 1.4.*. Oh and installation on Windows is a bit tricky.