I have a 4.5 million line XML file, and I cannot figure out a way to parse the information out using the decoder.DecodeElement() function.
A snippet of the XML:
<dt
xmlns:directive="d"
xmlns:ref="ref">
<Data>
<directive:Entry Name='abcd'>
<list>
<map>
<directive:Entry Name='id'>
<Integer>21</Integer>
</directive:Entry>
<directive:Entry Name='t'>
<Date>T14:31:43.823Z</Date>
</directive:Entry>
</map>
</list>
</directive:Entry>
</Data>
</dt>
So the above constitutes one line of the XML file. My goal is to extract 't' and 'id'.
My current attempt involves creating a struct:
type DT struct {
id string `xml:"Data"` // This is my attempt to get the entire Data portion/segment/chunk(?)
}
The code to perform the actual decoding:
decoder := xml.NewDecoder(readInFile())
for {
t, _ := decoder.Token()
if t == nil {
break
}
switch se := t.(type) {
case xml.StartElement:
inE := se.Name.Local
if inE == "dt" {
var dt DT
decoder.DecodeElement(&dt, &se)
fmt.Println(&dt)
}
}
}
The above code, when run, outputs
&{}
Which tells me that no information can be parsed out. The same is true when if I output
fmt.Println(&dt.id)
Could someone please help me. I am not sure if the reason my output is blank is due to the way I am extracting in the struct, or an issue with my decoding.
Instead of using decoder.DecodeElement() I would recommend you use xml.Unmarshal, and for xml.Unmarshal
to be able to do what you want it to do, the DT
type's structure has to match the <dt>
element's structure following the rules documented here.
For example something like this:
type DT struct {
DataEntry struct {
List []EntryMap `xml:"list"`
} `xml:"Data>Entry"`
}
type EntryMap struct {
Entries []Entry `xml:"map>Entry"`
}
type Entry struct {
Name string `xml:",attr"`
Value string `xml:",any"`
}
You can then loop over dt.DataEntry.List[N].Entries
to get what you need.
Im using xmlquery library to parse and extract data from XML document.
package main
import (
"fmt"
"strings"
"github.com/antchfx/xmlquery"
)
func main() {
var s = `<dt
xmlns:directive="d"
xmlns:ref="ref">
<Data>
<directive:Entry Name='abcd'>
<list>
<map>
<directive:Entry Name='id'>
<Integer>21</Integer>
</directive:Entry>
<directive:Entry Name='t'>
<Date>T14:31:43.823Z</Date>
</directive:Entry>
</map>
</list>
</directive:Entry>
</Data>
</dt>`
doc, err := xmlquery.Parse(strings.NewReader(s))
if err != nil {
panic(err)
}
id := xmlquery.FindOne(doc, "//directive:Entry[@Name='id']/Integer")
fmt.Println(id.InnerText())
t := xmlquery.FindOne(doc, "//directive:Entry[@Name='t']/Date")
fmt.Println(t.InnerText())
}
It's very simple and easy used.