Go is pretty new to me and i have some troubles understanding the memory usage :
I want to load a file similar to csv into an array of rows, each row being a struct composed of a key on 22 char and an array of values (string).
My code look like this : https://play.golang.org/p/hJ4SHjVXaG
Problem is that for a file of 450M it uses around 2G1 of memory.
Does anyone have a solution to reduce that memory use ?
Update using SirDarius solution : https://play.golang.org/p/DBmOFOkZdx still use around 1G9
How many lines and fields are there in the file?
It is plausible that what you are describing is using the minimum amount of memory.
Looking at the code I think it will use 450MB of memory for the underlying string data.
It will then slice that up into strings. These consist of a pointer and a length which take 16 bytes on a 64 bit platform.
So 1.5GB/16 = 93Million.
So if there are >50 Million fields in your file then the memory use seems reasonable.
There are other overheads like number of rows etc so this isn't an exact calculation.
EDIT
Given
5 millions row, 10 column each
That is 50 million string headers of 16 bytes which will take 800MB. Plus the data itself 450MB, plus 5 * 8 * 5 million Rows = 200MB makes 1.45GB
So I don't think even with perfect memory allocation, you'll be able to reduce the usage below 1.5GB.
This seems pretty inefficient to me:
for _, value := range strings.Split(line[23:], ";") {
row.Values = append(row.Values, value)
}
You basically obtain a []string
by calling the string.Split
function, and then loop over that slice to append every string to another initially nil string slice.
Why not just do:
row.Values = strings.Split(line[23:], ";")
instead ?
Though I can't guarantee it, it might be possible that the loop causes each string to be copied, and therefore make your program use twice as memory as needed.
You are appending into a Row
struct the values obtained by each iteration, which considering the huge file size is not a reasonable good approach. Why your are not processing the file in batches?
Looking at the Split function it returns a slice of substrings, so it's not necessary to range over the resulted slices and append them into the row.Values
. You can assign the resulted values directly to row.Values
, then append it to the rows
slice.
func Split(s, sep string) []string
Split slices s into all substrings separated by sep and returns a slice of the substrings between those separators. If sep is empty, Split splits after each UTF-8 sequence. It is equivalent to SplitN with a count of -1.
row.Values = strings.Split(line[23:], ";")
rows = append(rows, row)
Seems to me it's about append() function. From language spec
If the capacity of s is not large enough to fit the additional values, append allocates a new, sufficiently large underlying array
Size of this newly allocated array can be sufficient enough to consume even more further appends. So to allocate precisely you should slice := make([]Row, 0, WithExpectedCapacity)
and than assign slice[n]=
instead of append()
. If you can't do this, you at least can try reflection to compact
reflect.ValueOf(&slice).Elem().SetCap(len(slice))
Some tricky, but you can see https://play.golang.org/p/LslkOBCvII it works.