How to efficiently replace strings occurrences between two strings delimiters using Go bytes?
For example my flat file (3Mb) content is similar to:
Lorem START ipsum END dolor sit amet, START adipiscing END elit.
Ipsum dolor START sit END amet, START elit. END
.....
I would like to replace all ocurrencies between START
and END
delimiters. Like my file size is 3Mb it's bad idea to load whole content in memory.
Thanks.
You can use bufio.Scanner
with bufio.ScanWords
, tokenize on whitespace boundaries, and compare non-whitespace sequences to your delimiter:
scanner := bufio.NewScanner(reader)
scanner.Split(bufio.ScanWords) // you can implement your own split function
// but ScanWords will suffice for your example
for scanner.Scan() {
// scanner.Bytes() efficiently exposes the file contents
// as slices of a larger buffer
if bytes.HasPrefix(scanner.Bytes(), []byte("START")) {
... // keep scanning until the end delimiter
}
// copying unmodified inputs is quite simple:
_, err := writer.Write( scanner.Bytes() )
if err != nil {
return err
}
}
This will ensure that the amount of data read in from the file remains bounded (this is controlled by MaxScanTokenSize
)
Note that if you want to use multiple goroutines, you'll need to copy the data first, since scanner.Bytes()
returns a slice that is only valid until the next call to .Scan()
, but if you choose to do that then I wouldn't bother with a scanner.
For what it's worth, a 3MB size file is actually not such a bad idea to load on a general purpose computer nowadays, I would only think twice if it was an order of magnitude bigger. It would almost certainly be faster to use bytes.Split
with your delimiters.