如何使用HTTP提取共享会话并将其存储到S3操作?

I need to fetch the contents from many several URLs and store it in AWS S3. I've written a function to do that which works. But I am looking to make it faster and more efficient by re-using http client connection and re-using the AWS session. Furthermore I'm looking to get them to run concurrently, say 5 at a time.

func fetchPut(fromURL string, toS3 string) error {

      start := time.Now()
      resp, err := http.Get(fromURL)
      if err != nil {
          return err
      }
      defer resp.Body.Close()

      sess := session.Must(session.Must(session.NewSession()))
      s3svc := s3.New(sess)

      s3URL, _ := url.Parse(toS3)

      byteArray, _ := ioutil.ReadAll(resp.Body)
      fetchElapsed := time.Since(start).Seconds()

      start = time.Now()
      input := &s3.PutObjectInput{
          Body:         bytes.NewReader(byteArray),
          Bucket:       aws.String(s3URL.Host),
          Key:          aws.String(s3URL.Path),
      }
      _, err = s3svc.PutObject(input)
      putElapsed := time.Since(start).Seconds()

      return err
}

What I don't understand is how I can re-use the session (both http & AWS). Can I have it in some global variable? Or do I have to create some sort of context?

Are there any good examples of this sort of use case to study?

Your problem seems to be pretty general.

As a principle you need to separate things which don't change (session & AWS service object, destination non-varying part like the bucket name) from the ones which change (src, dest. varying part like the key name), then setup non-changing configuration once, then run URL fetch + S3 store concurrently, passing your config as an additional arg.

That would boil down to moving your s3svc creation out of fetchPut function and passing it as an arg, then running fetchPutin goroutines, possibly with using async.WaitGroup if you want to wait for all of them to finish.

Other variation would be to run two pools of workers: producers (fetching URLs) and consumers (putting to S3) and use a channel to inform that one can feed another. That would probably give most of speedup.

In general, I agree with your idea of making it concurrent - it's pretty good mind-stretching example; doesn't have to be considered as premature optimization. I also can't resist advertising Rob Pike's excellent talk "Concurrency Is Not Parallelism". Rob's example of a load balancer is more complicated than your case, still gives a good overview how to process requests concurrently.

Btw, "session" used for http fetch is kind of transparent; as the commenters already mentioned, http client from standard library will be reused and you don't have to worry about that.