在Go中高效读写CSV

The Go code below reads in a 10,000 record CSV (of timestamp times and float values), runs some operations on the data, and then writes the original values to another CSV along with an additional column for score. However it is terribly slow (i.e. hours, but most of that is calculateStuff()) and I'm curious if there are any inefficiencies in the CSV reading/writing I can take care of.

package main

import (
  "encoding/csv"
  "log"
  "os"
  "strconv"
)

func ReadCSV(filepath string) ([][]string, error) {
  csvfile, err := os.Open(filepath)

  if err != nil {
    return nil, err
  }
  defer csvfile.Close()

  reader := csv.NewReader(csvfile)
  fields, err := reader.ReadAll()

  return fields, nil
}

func main() {
  // load data csv
  records, err := ReadCSV("./path/to/datafile.csv")
  if err != nil {
    log.Fatal(err)
  }

  // write results to a new csv
  outfile, err := os.Create("./where/to/write/resultsfile.csv"))
  if err != nil {
    log.Fatal("Unable to open output")
  }
  defer outfile.Close()
  writer := csv.NewWriter(outfile)

  for i, record := range records {
    time := record[0]
    value := record[1]

    // skip header row
    if i == 0 {
      writer.Write([]string{time, value, "score"})
      continue
    }

    // get float values
    floatValue, err := strconv.ParseFloat(value, 64)
    if err != nil {
      log.Fatal("Record: %v, Error: %v", floatValue, err)
    }

    // calculate scores; THIS EXTERNAL METHOD CANNOT BE CHANGED
    score := calculateStuff(floatValue)

    valueString := strconv.FormatFloat(floatValue, 'f', 8, 64)
    scoreString := strconv.FormatFloat(prob, 'f', 8, 64)
    //fmt.Printf("Result: %v
", []string{time, valueString, scoreString})

    writer.Write([]string{time, valueString, scoreString})
  }

  writer.Flush()
}

I'm looking for help making this CSV read/write template code as fast as possible. For the scope of this question we need not worry about the calculateStuff method.

You're loading the file in memory first then processing it, that can be slow with a big file.

You need to loop and call .Read and process one line at a time.

func processCSV(rc io.Reader) (ch chan []string) {
    ch = make(chan []string, 10)
    go func() {
        r := csv.NewReader(rc)
        if _, err := r.Read(); err != nil { //read header
            log.Fatal(err)
        }
        defer close(ch)
        for {
            rec, err := r.Read()
            if err != nil {
                if err == io.EOF {
                    break
                }
                log.Fatal(err)

            }
            ch <- rec
        }
    }()
    return
}

playground

//note it's roughly based on DaveC's comment.

encoding/csv is indeed very slow on big files, as it performs a lot of allocations. Since your format is so simple I recommend using strings.Split instead which is much faster.

If even that is not fast enough you can consider implementing the parsing yourself using strings.IndexByte which is implemented in assembly: http://golang.org/src/strings/strings_decl.go?s=274:310#L1

Having said that, you should also reconsider using ReadAll if the file is larger than your memory.

This is essentially Dave C's answer from the comments sections:

package main

import (
  "encoding/csv"
  "log"
  "os"
  "strconv"
)

func main() {
  // setup reader
  csvIn, err := os.Open("./path/to/datafile.csv")
  if err != nil {
    log.Fatal(err)
  }
  r := csv.NewReader(csvIn)

  // setup writer
  csvOut, err := os.Create("./where/to/write/resultsfile.csv"))
  if err != nil {
    log.Fatal("Unable to open output")
  }
  w := csv.NewWriter(csvOut)
  defer csvOut.Close()

  // handle header
  rec, err := r.Read()
  if err != nil {
    log.Fatal(err)
  }
  rec = append(rec, "score")
  if err = w.Write(rec); err != nil {
    log.Fatal(err)
  }

  for {
    rec, err = r.Read()
    if err != nil {
      if err == io.EOF {
        break
      }
      log.Fatal(err)
    }

    // get float value
    value := rec[1]
    floatValue, err := strconv.ParseFloat(value, 64)
    if err != nil {
      log.Fatal("Record, error: %v, %v", value, err)
    }

    // calculate scores; THIS EXTERNAL METHOD CANNOT BE CHANGED
    score := calculateStuff(floatValue)

    scoreString := strconv.FormatFloat(score, 'f', 8, 64)
    rec = append(rec, scoreString)

    if err = w.Write(rec); err != nil {
      log.Fatal(err)
    }
  w.Flush()
  }
}

Note of course the logic is all jammed into main(), better would be to split it into several functions, but that's beyond the scope of this question.