What would be an efficient (performance and readability) of parsing lines in a log file and extracting points of interest?
For example:
*** Time: 2/1/2019 13:51:00
17.965 Pump 10 hose FF price level 1 limit 0.0000 authorise pending (Type 00)
17.965 Pump 10 State change LOCKED_PSTATE to CALLING_PSTATE [31]
38.791 Pump 10 delivery complete, Hose 1, price 72.9500, level 1, value 100.0000, volume 1.3700, v-total 8650924.3700, m-total 21885705.8800, T13:51:38
Things I need to extract are 10 (for pump 10), Price Level. Limit The _PSTATE changes the values from the delivery completel line etc.
Currently I'm using a regular expression to capture each one and using capture groups. But it feels inefficient and there is quite a bit of duplication.
For example, I have a bunch of these:
reStateChange := regexp.MustCompile(`^(?P<offset>.*) Pump (?P<pump>\d{2}) State change (?P<oldstate>\w+_PSTATE) to (?P<newstate>\w+)_PSTATE`)
Then inside a while loop
if match := reStateChange.FindStringSubmatch(text); len(match) > 0 {
matched = true
for i, name := range match {
result[reStateChange.SubexpNames()[i]] = name
}
} else if match := otherReMatch.FindStringSubmatch(text); len(match) > 0 {
matched = true
for i, name := range match {
result[reStateChange.SubexpNames()[i]] = name
}
} else if strings.Contains(text, "*** Time:") {
}
It feels that there could be a much better way to do this. I would trade some performance for readability. The log files are only really 10MB max. Often smaller.
I'm after some suggestions on how to make this better in golang.
If all your log lines are similar to that sample you posted, they seem quite structured so regular expressions might be a bit overkill and hard to generalize.
Another option would be for you to transform each of those lines to a slice of strings ([]string
) by using strings.Fields, or even strings.FieldFunc so that you can strip both white space and commas.
Then you can design an interface like:
type LogLineProcessor interface {
CanParse(line []string)
GetResultFrom(line []string) LogLineResult
}
Where LogLineResult
is an struct containing the extracted information.
You can then define multiple structs with methods that implement LogLineProcessor
(each implementation would look at specific positions on that []string
to realize if it is a line it can process or not, like looking for the words "hose", "FF" and "price" in the positions it expects to find them).
The GetResultFrom
implementations would also extract each data point from specific positions in the []string
(it can rely on that information being there if it already determined it was one of the lines it can process).
You can create a var processors []LogLineProcessor
, put all your processors in there and then just iterate that array:
line := strings.Fields(text)
for _, processor := range processors {
if processor.CanParse(line) {
result := processor.GetResultFrom(line)
// do whatever needed with the result
}
}