I've found a few questions that are similar to mine, but nothing that answers my specific question.
I want to upload CSV data to s3. My basic code is along the lines of (I've simplified getting the data for brevity, normally it's reading from a database):
reader, writer := io.Pipe()
go func() {
cWriter := csv.NewWriter(pWriter)
for _, line := range lines {
cWriter.Write(line)
}
cWriter.Flush()
writer.Close()
}()
sess := session.New(//...)
uploader := s3manager.NewUploader(sess)
result, err := uploader.Upload(&s3manager.UploadInput{
Body: reader,
//...
})
The way I understand it, the code will wait for writing to finish and then will upload the contents to s3, so I end up with the full contents of the file in memory. Is it possible to chunk the upload (possibly using the s3 multipart upload?) so that for larger uploads, I'm only storing part of the data in memory at any one time?
The uploader is supported multipart upload if I had read source code of the uploader in right way: https://github.com/aws/aws-sdk-go/blob/master/service/s3/s3manager/upload.go
The minimum size of an uploaded part is 5 Mb.
// MaxUploadParts is the maximum allowed number of parts in a multi-part upload
// on Amazon S3.
const MaxUploadParts = 10000
// MinUploadPartSize is the minimum allowed part size when uploading a part to
// Amazon S3.
const MinUploadPartSize int64 = 1024 * 1024 * 5
// DefaultUploadPartSize is the default part size to buffer chunks of a
// payload into.
const DefaultUploadPartSize = MinUploadPartSize
u := &Uploader{
PartSize: DefaultUploadPartSize,
MaxUploadParts: MaxUploadParts,
.......
}
func (u Uploader) UploadWithContext(ctx aws.Context, input *UploadInput, opts ...func(*Uploader)) (*UploadOutput, error) {
i := uploader{in: input, cfg: u, ctx: ctx}
.......
func (u *uploader) nextReader() (io.ReadSeeker, int, error) {
.............
switch r := u.in.Body.(type) {
.........
default:
part := make([]byte, u.cfg.PartSize)
n, err := readFillBuf(r, part)
u.readerPos += int64(n)
return bytes.NewReader(part[0:n]), n, err
}
}