I need to get some large objects (up to 50GB) over HTTP from AWS S3 and calculate the hash of each object. I also need to get the content type and if it is a certain type, get some metadata.
I want to do all of the above in parallel.
I am considering using a io.MultiWriter
and io.Pipe
s (adapted from solution #5 in this article).
func handleUpload(u io.Reader) {
// create the pipes
contentTypeR, contentTypeW := io.Pipe()
metaR, metaW := io.Pipe()
hashR, hashW := io.Pipe()
// create channel to synchronize
done := make(chan bool)
defer close(done)
contentTypeCh := make(chan string)
defer close(contentTypeCh)
go getContentType(contentTypeR, contentTypeCh, done)
go processMetadata(metaR, contentTypeCh, done)
go calculateHash(hashR, done)
go func() {
defer contentTypeW.Close
defer metaW.Close()
defer hashW.Close()
mw := io.MultiWriter(contentTypeW, metaW, hashW)
io.Copy(mw, u)
}()
// wait until all are done
for c := 0; c < 3; c++ {
<-done
}
}
func getContentType(r io.Reader, contentTypeCh chan <- string, done <-chan bool) {
lr := io.LimitReader(r, 512) // only read the first 512 bytes
first512, err := ioutil.ReadAll(r)
if err != nil{
//do something with the error
}
contentType := http.DetectContentType(first512)
contentTypeCh <- contentType
...
}
func processMetadata(r io.Reader, contentTypeCh <-chan string, done <-chan bool) {
contentType := <-contentTypeCh
if contentType != "image" {
return // don't get the metadata if the type is not an image
}
...
}
My main concern is the content type detection and process metadata functions which are executed in goroutines. For content type detection, we only read the first 512 bytes. For metadata processing, we do not do anything if the content type is not an image.
If the multiwriter continues to write data into a pipe using io.Copy()
and the pipe's reader is not being read, will this cause memory or resource leaks?