I am working with go to download files from one server and after manipulating the files sending it to another server.
The files size can vary from 1MB to 200MB.
Currently, my code is pretty simple, I am using http.Client and bytes.Buffer .
It takes lot of time to handle does big files (the 100MB to 200MB) which there is a lot of them.
After a quick profiling, I see that most of the time I do bytes.(*Buffer).grow,
How can I create big buffers for example for 16MB?
What can I do in order to improve my efficiency of the code? General tips for handling with large http requests?
Edit
I will explain, exactly what I am trying to do. I have couchdb documents (with attachments) that I am trying to copy to another couchdb instance. The couchdb documents size can be from 30MB to 200MB, copying tiny (2 - 10MB) couchdb documents - is really fast.
But sending the document over the wire is really slow. I am currently, trying to profile, and try to use @Evan answer to see what is my problem.
Take a look at the description for bytes.NewBuffer
: http://golang.org/pkg/bytes/#NewBuffer
Sounds like you can create a 16MB byte slice and use it to initialize the buffer.
You could consider the fact your program has no need to keep the data in memory if all it needs to do is to copy it.
Now the strong feature of Go's standard library is sensible uses of interfaces: http.Response
's Body
member is something implementing the io.ReadCloser
interface, and that satisfies the type of the body
argument of the http.Client
's Post
method.
So you could roll like this:
Perform a request for the document—you'll get an instance of http.Response
back, which has the Body
member of type io.readCloser
.
Note that at this point you haven't actually started receiving the body from the "source" server because to do that you'll have to drain the io.ReadCloser
of Body
.
Initiate another (supposedly POST
) request to send the data, and when making the request supply it that Body
member obtained in the first step.
Once this request is done piping your data, call Close()
on that Body
member.
Something like this:
import "net/http"
func Pipe(from, to string) (err error) {
src, err := http.Get(from)
if err != nil {
return
}
dst, err := http.Post(to, myPostType, src.Body)
if err != nil {
return
}
// Now read and then Close() the dst.Body member.
}
In this code, http.Post
will read from src.Body
and then Close()
it itself.
You might add bytes.Buffer
into the mix in hope to reduce the amount of syscalls performed but don't do that unless the plain method does not work.
As @Evan already pointed out: you can choose an initial buffer size when creating a new buffer.
Since allocation of buffers is so expensive (this is why your grow
calls take so long; they re-allocate if the size does not fit anymore), picking the right buffer size is key. Picking the right strategy for buffer allocation depends on a lot of factors. You might choose your own method of growing buffers depending on your application profile.
You should also consider recycling your buffers to prevent heap fragmentation: http://blog.cloudflare.com/recycling-memory-buffers-in-go