从Go中的html页面提取文本

Looking for a way to simply get the text of a web page, preferably without having to resort to a bunch of regular expressions.

Just thought I'd check first in case this kind of thing is already built in, or at least easier to do in Go.

You could use go-query. This lib can be used like jquery to grep text and doc elements from a html document.

This example is taken from the github page:

package main

import (
    "fmt"
    "github.com/PuerkitoBio/goquery"
    "log"
)

func ExampleScrape() {
    doc, err := goquery.NewDocument("http://metalsucks.net")
    if err != nil {
        log.Fatal(err)
    }
    doc.Find(".reviews-wrap article .review-rhs").Each(func(i int, s *goquery.Selection) {
        band := s.Find("h3").Text()
        title := s.Find("i").Text()
        fmt.Printf("Review %d: %s - %s
", i, band, title)
    })
}
func main() {
    ExampleScrape()
}