I have a txt file with the following sample data:
host{
Entry {
id: "foo"
}
Entry {
id: "bar"
}
}
port{
Entry {
id: "lorem"
}
Entry {
id: "ipsum"
}
}
It has +300 of those Entry values. I'd like to read the file and extract the id values belonging to the port section. It's not valid JSON so I can't use the json decoder, is there any other way of extracting the values?
If the structure is the same throughout and all you want is the id values you can do something like this (on the Playground):
package main
import (
"fmt"
"strings"
)
func main() {
// This will work only if ids don't have spaces
fields := strings.Fields(input1)
for i, field := range fields {
if field == "id:" {
fmt.Println("Got an id: ", fields[i+1][1:len(fields[i+1])-1])
}
}
fmt.Println()
// This will extract all strings enclosed in ""
for i1, i2 := 0, 0;; {
i := strings.Index(input2[i1:], "\"") // find the first " starting after the last match
if i > 0 { // if we found one carry on
i1 = i + 1 + i1 // set the start index to the absolute position in the string
i2 = strings.Index(input2[i1:], "\"") // find the second "
fmt.Println(input2[i1 : i1+i2]) // print the string between ""
i1 += i2 + 1 // set the new starting index to after the last match
} else { // otherwise we are done
break
}
}
// Reading the text line by line and only processing port sections
parts := []string{"port{", " Entry {", " id: \"foo bar\"", " }", " Entry {", " id: \"more foo bar\"", " }", "}"}
isPortSection := false
for _, part := range parts {
if string.HasPrefix(part, "port"){
isPortSection = true
}
if string.HasPrefix(part, "host"){
isPortSection = false
}
if isPortSection && strings.HasPrefix(strings.TrimSpace(part),"id:") {
line := strings.TrimSpace(part)
fmt.Println(line[5:len(line)-1])
}
}
}
var input1 string = `port{
Entry {
id: "foo"
}
Entry {
id: "bar"
}
}`
var input2 string = `port{
Entry {
id: "foo bar"
}
Entry {
id: "more foo bar"
}
}`
Prints:
Got an id: foo
Got an id: bar
foo bar
more foo bar
Instead of printing them in the loop you can stick them into a slice or map or do whatever you want/need to. And of course instead of using the string literal you read in the lines from your file.
I believe text/scanner
might be very useful here. It's not plug&play, but will allow you to tokenise input and will parse your strings nicely (spaces, escaped values etc.). A quick proof of concept, scanner with a simple state machine to capture all id: {str}
patterns which are in Entry
section:
var s scanner.Scanner
s.Init(strings.NewReader(src))
// Keep state of parsing process
const (
StateNone = iota
StateID
StateIDColon
)
state := StateNone
lastToken := "" // last token text
sections := []string{} // section stack
tok := s.Scan()
for tok != scanner.EOF {
txt := s.TokenText()
switch txt {
case "id":
if state == StateNone {
state = StateID
} else {
state = StateNone
}
case ":":
if state == StateID {
state = StateIDColon
} else {
state = StateNone
}
case "{":
// Add section
sections = append(sections, lastToken)
case "}":
// Remove section
if len(sections) > 0 {
sections = sections[0 : len(sections)-1]
}
default:
if state == StateIDColon && sections[0] == "port" {
// Our string is here
fmt.Println(txt)
}
state = StateNone
}
lastToken = txt
tok = s.Scan()
}
You can play it here. This surely requires some more work if you need validate the input structure etc. but seems like a good starting point to me.