For a given XML document, I want to extract all the text content exclude some certain tags and do some transformation with the content and push the modified content back to the position where they are exacted.
I tried to generate a tree(saying a nested map
) for the document, and after the transformation, build the document by the tree again.
However I have not find any library I can used in golang.
Is this possible?
update:
The structure of the xml document are not fixed.
Basically there are three approaches I can think of:
Define a set of Go types matching the elements of your XML document. Then unmarshal the document into a hierarchy of these variables. Then apply to these variables whatever updates are needed and marshal them back to an XML document.
The upside of this approach is that it's "standard" (requires just the encoding/xml
standard package) and is "data-driven".
The downsides are many:
Do "SAX-style" processing: encoding/xml
allows you to "step-through" each XML node — as the parser decodes them.
So it's possible to create a Decoder
reading the source document and an Encoder
— producing the resulting one. Each token decoded by the decoder gets encoded by the encoder either right away or after certain processing on your side (which may result in adding more tokens).
Unfortunately, if you need to maintain some context between visiting different tokens (so, say, only modify text nodes of the elements which are on certain paths in the document), this is not doable easily (though still possible, of course).
Use a full-blown XSLT processing.
You may look at this.