I'm retrieving an XML string from an external web service that I do not control. Some of the data contains empty spaces <data> I have leading white space</data>
. How do I trim the space from each element within the XML string?
You can use the primitives in the encoding/xml package to modify an XML stream on the fly. In this case implementing xml.TokenReader is a simple solution:
import (
"bytes"
"encoding/xml"
)
type Trimmer struct {
dec *xml.Decoder
}
func (tr Trimmer) Token() (xml.Token, error) {
t, err := tr.dec.Token()
if cd, ok := t.(xml.CharData); ok {
t = xml.CharData(bytes.TrimSpace(cd))
}
return t, err
}
Trimmer wraps an underlying decoder and returns a modified token stream. xml.CharData
represents text nodes. Whenever one is encountered, bytes.TrimSpace
is called to trim leading and trailing whitespace. All other tokens are returned unmodified.
xml.NewTokenDecoder
turns Trimmer
back into a regular Decoder:
import (
"encoding/xml"
"fmt"
"io"
"log"
)
var r io.Reader // data source
raw := xml.NewDecoder(r) // regular decoder
dec := xml.NewTokenDecoder(Trimmer{raw}) // trimming decoder
var v MyType
err := dec.Decode(&v)
This could easily not work depending on what the data looks like, but if the data is predictable and won't contain >
anywhere except in XML tags, you could do:
https://play.golang.org/p/4YSpvLFwHjZ
package main
import (
"fmt"
"regexp"
)
func main() {
r := regexp.MustCompile(">(\\s*)")
xml := "<test> hello</test><test> There</test><test>!</test>"
xml = r.ReplaceAllString(xml, ">")
fmt.Println(xml)
}