I'm working with epubs using Golang, I have to fetch the cover image from cover.xhtml file (or whatever file it is mentioned in .opf file).
My problem is with dynamic structure of elements in the Cover.xhtml files.
Each epubs has different structure on the Cover.xhtml file. For example,
<body>
<figure id="cover-image">
<img src="covers/9781449328030_lrg.jpg" alt="First Edition" />
</figure>
</body>
Another epub cover.xhtml file
<body>
<div>
<img src="@public@vhost@g@gutenberg@html@files@54869@54869-h@images@cover.jpg" alt="Cover" />
</div>
</body>
I need to fetch the img tag's src attribute from this file. But I couldn't do it.
Here is the part of my Code that deals with unmarshalling the cover.xhtml file
type CPSRCS struct {
Src string `xml:"src,attr"`
}
type CPIMGS struct {
Image CPSRCS `xml:"img"`
}
XMLContent, err = ioutil.ReadFile("./uploads/moby-dick/OPS/cover.xhtml")
CheckError(err)
coverFile := CPIMGS{}
err = xml.Unmarshal(XMLContent, &coverFile)
CheckError(err)
fmt.Println(coverFile)
The output is:
{{}}
The output I'm expecting is:
{{covers/9781449328030_lrg.jpg}}
Thanks in advance!
This will pull out the img
element from the read in file and then unmarshal the src attribute from the element. This is making the assumption that you will only ever need to grab the first img
element from the file.
XMLContent, err = ioutil.ReadFile("./uploads/moby-dick/OPS/cover.xhtml")
CheckError(err)
//Parse the XMLContent to grab just the img element
strContent := string(XMLContent)
imgLoc := strings.Index(strContent, "<img")
prefixRem := strContent[imgLoc:]
endImgLoc := strings.Index(prefixRem, "/>")
//Move over by 2 to recover the '/>'
trimmed := prefixRem[:endImgLoc+2]
var coverFile CPSRCS
err = xml.Unmarshal([]byte(trimmed), &coverFile)
CheckError(err)
fmt.Println(coverFile)
This will produce the result of {covers/9781449328030_lrg.jpg} for the first input file and {@public@vhost@g@gutenberg@html@files@54869@54869-h@images@cover.jpg} for the second input file you provided.