I tried to get html source from Reddit with Golang:
package main
import (
"fmt"
"io/ioutil"
"net/http"
"time"
)
func main() {
timeout := time.Duration(5 * time.Second)
client := http.Client{
Timeout: timeout,
}
resp, _ := client.Get("https://www.reddit.com/")
bytes, _ := ioutil.ReadAll(resp.Body)
fmt.Println("HTML:
", string(bytes))
defer resp.Body.Close()
var input string
fmt.Scanln(&input)
}
First attemp was good. But at the second time it ran into an error:
<p>we're sorry, but you appear to be a bot and we've seen too many requests
from you lately. we enforce a hard speed limit on requests that appear to come
from bots to prevent abuse.</p>
<p>if you are not a bot but are spoofing one via your browser's user agent
string: please change your user agent string to avoid seeing this message
again.</p>
<p>please wait 6 second(s) and try again.</p>
<p>as a reminder to developers, we recommend that clients make no
more than <a href="http://github.com/reddit/reddit/wiki/API">one
request every two seconds</a> to avoid seeing this message.</p>
I tried to set delay but it still not work. Sorry about my bad English.
Reddit doesn't want automatic scanner\grabbers on their site and has a bot-protection mechanism. Here's a recommendation from them:
one request every two seconds
Just add a delay between requests.
timeout
serves a different purpose. timeout is an upper limit for a routine to run. What you need is sleep
between subsequent requests.
time.Sleep(6 * time.Second)