Background
user@host curl -s http://stackoverflow.com | grep -m 1 stackoverflow.com
returns immediately if the string is found:
<meta name="twitter:domain" content="stackoverflow.com"/>
Aim
find a string on a website using Golang
Method
Based on sources from Go by Example and Schier's Blog the following code was created:
package main
import (
"fmt"
"io/ioutil"
"net/http"
"regexp"
)
func main() {
url := "http://stackoverflow.com"
resp, _ := http.Get(url)
bytes, _ := ioutil.ReadAll(resp.Body)
r, _ := regexp.Compile("stackoverflow.com")
fmt.Println(r.FindString(string(bytes)))
resp.Body.Close()
}
Results
Running the code results in:
stackoverflow.com
Discussion & Conclusions
This code implements grep, stopping at the first line that contains the given string. It avoids reading the entire webpage into memory at once by using a bufio.Scanner
, which apart from bounding the memory use might also speed up the program in the case where the string is found near the start of a huge page. It's careful to use scan.Bytes()
to avoid converting every line into a string, which would cause significant memory churn.
package main
import (
"bufio"
"bytes"
"fmt"
"log"
"net/http"
)
func main() {
resp, err := http.Get("http://stackoverflow.com")
if err != nil {
log.Fatalf("failed to open url")
}
scan := bufio.NewScanner(resp.Body)
toFind := []byte("stackoverflow.com")
defer resp.Body.Close()
for scan.Scan() {
if bytes.Contains(scan.Bytes(), toFind) {
fmt.Println(scan.Text())
return
}
}
}