I am looking at the documented example here, but it is iterating purely over an XML tree, and not HTML. Therefore, I am still partly confused.
For example, if I wanted to find a specific meta tag within the head tag by name, it seems I cannot? Instead, I need to find it by the order it is in the head tag. In this case, I want the 8th meta tag, which I assume is:
headTag, err := getByID(xmlroot, "/head/meta[8]/")
But of course, this is using a getByID function for a tag name - which I don't believe will work. What is the full list of "getBy..." commands?
Then, the problem is, how do I access the meta tag's contents? The documentation only provides examples for the inner tag node content. However, will this example work?:
resp.Query = extractValue(headTag,
@content
)
The @ selector confuses me, is this appropriate for this case?
In other words:
Thank you very much!
XPath does not seem suitable here; you should be using goquery, which is designed for HTML.
Here is an example:
package main
import (
"fmt"
"github.com/PuerkitoBio/goquery"
)
func main() {
doc, err := goquery.NewDocument("https://example.com")
if err != nil {
panic(err)
}
s := doc.Find(`html > head > meta[name="viewport"]`)
if s.Length() == 0 {
fmt.Println("could not find viewpoint")
return
}
fmt.Println(s.Eq(0).AttrOr("content", ""))
}
I know this answer is late, but I still want to recommend an htmlquery package that is simple and powerful, based on XPath expressions*.
The below code based on @Time-Cooper example.
package main
import (
"fmt"
"github.com/antchfx/htmlquery"
)
func main() {
doc, err := htmlquery.LoadURL("https://example.com")
if err != nil {
panic(err)
}
s := htmlquery.Find(doc, "//meta[@name='viewport']")
if len(s) == 0 {
fmt.Println("could not find viewpoint")
return
}
fmt.Println(htmlquery.SelectAttr(s[0], "content"))
// alternative method,but simple more.
s2 := htmlquery.FindOne(doc, "//meta[@name='viewport']/@content")
fmt.Println(htmlquery.InnerText(s2))
}