Ok so I have a massive 2.5G CSV file which is about 25million or so records with about 20 columns.. I am trying to use GO to process this monster and do some formatting then inserting into a database. I have this basic code setup with channels because I figured it would be the fastest using go routines and such: here
The problem being is because it is blocking, my channel just gets STUFFED with an insane amount of data and before I know it my memory is out of control. So before any processing or inserting gets done it fails.
Could someone help me out with this code and see if I can simultaneously build up the queue from reading the file WHILE processing and inserting?
For every record of your big CSV file you start a new goroutine. Every goroutine allocates ~2kB
stack. It's not recommended to start a goroutine for everything.
Try to use a pipeline, the main goroutine would read the records and send trough a channel1
.
You start eg. 10 worker goroutines that process the records received from channel1
and send the processed values trough channel2
.
Then some other 10 goroutines would receive the values from channel2
and insert it to the database.
Here are some examples for pipelines.