I want to read a file in csv format containing only numeric values (with decimals) and store it on a matrix so I can perform operations on them. The file looks like this:
1.5, 2.3, 4.4
1.1, 5.3, 2.4
...
It may have thousands of lines and more than 3 columns.
I solved this using the go csv library. This creates a [][]string and after I use a for loop to parse the matrix into [][]float64.
func readCSV(filepath string) [][]float64 {
csvfile, err := os.Open(filepath)
if err != nil {
return nil
}
reader := csv.NewReader(csvfile)
stringMatrix, err := reader.ReadAll()
csvfile.Close()
matrix := make([][]float64, len(stringMatrix))
//Parse string matrix into float64
for i := range stringMatrix {
matrix[i] = make([]float64, len(stringMatrix[0]))
for y := range stringMatrix[i] {
matrix[i][y], err = strconv.ParseFloat(stringMatrix[i][y], 64)
}
}
return matrix
}
I was wondering if this is a correct and efficient way of doing it or if there is a better way.
Like using reader.Read() instead and parse each line while it's being read. I don't know but it feel like I'm doing a lot duplicate work.
It all depends on how you want to use the data. Your code isn't efficient in terms of memory because you read the entire CSV content in memory (stringMatrix
) and then you create another variable to hold the data converted to float64 (matrix
). So if your CSV file is 1 GB in size, your program would use 1 GB of RAM for stringMatrix
+ a lot more for matrix
.
You can optimize the code by either:
reader
line by line and appending the data to matrix
; you don't need to have the entire stringMatrix
in memory at once;reader
line by line and processing that data line by line. Maybe you don't need to have matrix
in memory as well, maybe you can process the data as you read it and never have everything in memory at once. It depends on the rest of your program, on how it needs to use the CSV data.Your program can use a few bytes of RAM instead of gigabytes if you use the second method above, if you don't need to return the entire CSV data from that function.