I'm currently learning Go and have started to re-write a test-data-generation program I originally wrote in Java. I've been intrigued by Go's channels / threading possibilities, as many of the programs I've written have been focused around load testing a system / recording various metrics.
Here, I am creating some data to be written out to a CSV file. I started out by generating all of the data, then passing that off to be written to a file. I then thought I'd try and implement a channel, so data could be written while it's still being generated.
It worked - it almost eliminated the overhead of generating the data first and then writing it. However, I found that this only worked if I had a channel with a buffer big enough to cope with all of the test data being generated: c := make(chan string, count)
, where count is the same size as the number of test data lines I am generating.
So, to my question: I'm regularly generating millions of records of test data (load test applications) - should I be using a channel with a buffer that large? I can't find much about restrictions on the size of the buffer?
Running the below with a 10m count completes in ~59.5s; generating the data up front and writing it all to a file takes ~62s; using a buffer length of 1 - 100 takes ~80s.
const externalRefPrefix = "Ref"
const fileName = "citizens.csv"
var counter int32 = 0
func WriteCitizensForApplication(applicationId string, count int) {
file, err := os.Create(fileName)
if err != nil {
panic(err)
}
defer file.Close()
c := make(chan string, count)
go generateCitizens(applicationId, count, c)
for line := range c {
file.WriteString(line)
}
}
func generateCitizens(applicationId string, count int, c chan string) {
for i := 0; i < count; i++ {
c <- fmt.Sprintf("%v%v
", applicationId, generateExternalRef())
}
close(c)
}
func generateExternalRef() string {
atomic.AddInt32(&COUNTER, 1)
return fmt.Sprintf("%v%08d", externalRefPrefix, counter)
}