I'm building an application that will be downloading roughly 5000 CSV files concurrently using go routines and plain ol http get requests. Downloading the files in parallel.
I'm currently running into open file limits imposed by OS X.
The CSV files are served over http. Are there any other network protocols that I can use to batch each request into one? I don't have access to the server, so I can't zip them. I'd also prefer not to change the ulimit because once in production, I probably won't have access to that configuration.
You probably want to limit active concurrent requests to a more sensible number than 5000. Possibly spin up 10/20 workers and send individual files to them over a channel.
The http client should reuse connections for requests, assuming you always read the entire request body, and close it.
Something like this:
func main() {
http.DefaultTransport.(*http.Transport).MaxIdleConnsPerHost = 100
for i := 0; i < 10; i++ {
wg.Add(1)
go worker()
}
var csvs = []string{"http://example.com/a.csv", "http://example.com/b.csv"}
for _, u := range csvs {
ch <- u
}
close(ch)
wg.Wait()
}
var ch = make(chan string)
var wg sync.WaitGroup
func worker() {
defer wg.Done()
for u := range ch {
get(u)
}
}
func get(u string) {
resp, err := http.Get(u)
//check err here
// make sure we always read rest of body, and close
defer resp.Body.Close()
defer io.Copy(ioutil.Discard, resp.Body)
//read and decode / handle it. Make sure to read all of body.
}