I'm using the package golang.org/x/net/html
to scrape data out of HTML pages and this has been working fine so far. However, I don't know how to extract data from a drop-down list like this:
<!DOCTYPE html>
<html>
<body>
<select name="car" size="1" id="car">
<option value="volvo">Volvo</option>
<option value="saab">Saab</option>
<option value="vw">VW</option>
<option value="audi" selected>Audi</option>
</select>
<select name="animal" size="1" id="animal">
<option value="dog">Dog</option>
<option value="cat" selected>Cat</option>
<option value="badger">Badger</option>
<option value="mouse">Mouse</option>
</select>
I want to extract the pre-selected options, so the result becomes this:
car = audi
animal = cat
How can I accomplish this? In case golang.org/x/net/html
is not capable of doing what I want, what else can I do to extract the data?
You absolutely can do it with "net/html":
package main
import (
"fmt"
"golang.org/x/net/html"
"strings"
)
func main() {
s := "html"
result := make(map[string]string)
d := html.NewTokenizer(strings.NewReader(s))
currID := ""
for {
tokenType := d.Next()
if tokenType == html.ErrorToken {
break
}
token := d.Token()
switch tokenType {
case html.StartTagToken:
if token.Data == "select" {
for _, a := range token.Attr {
if a.Key == "id" {
currID = a.Val
}
}
}
if token.Data == "option" {
isSelected := false
for _, a := range token.Attr {
if a.Key == "selected" {
isSelected = true
}
}
if isSelected {
for _, a := range token.Attr {
if a.Key == "value" {
result[currID] = a.Val
}
}
}
}
}
}
fmt.Printf("%v
", result)
}
P.S. this code can be improved.
Maybe use gokogiri
for xpath selectors:
car, _ := doc.Search("//select[@id='car']/option[@selected]/text()")
animal, _ := doc.Search("//select[@id='animal']/option[@selected]/text()")