I'm trying to create a function that can parse strings which consist of an uppercase word followed by zero or more arguments which are encapsulated in double quotes.
For example, each of the following lines:
COPY "filename one" "filename two"
REMOVE "filename"
LIST "x" "y" "z"
DISCONNECT
The result should be a string (the command) followed by a string[] (the arguments inside the quotes). I created the following regular expression:
re1, _ := regexp.Compile(`([A-Z]+)(?: "([^"]+)")*`)
results := re1.FindAllStringSubmatch(input, -1)
However, no-matter what I try, only the last argument gets captured.
An example of my problem: https://play.golang.org/p/W1rE1X4SWf5
"arg1"
is not captured in this example. What am I missing?
If your commands are well defined, e.i. command names are always upper-case and arguments are always after the command then a looser regex might just fit your use case:
re1, _ := regexp.Compile(`([A-Z]+)|(?: "([^"]+)")`)
results := re1.FindAllStringSubmatch(`COMMAND "arg1" "arg2" "arg3"`, -1)
fmt.Println("Command:", results[0][1])
for _, arg := range results[1:] {
fmt.Println("Arg:", arg[2])
}
When you try to capture repeated match, only the last one is captured. I'd try to do it in two steps: first split commmand and arguments, then parse the arguments.
Splitting to command and arguments can be done with ([A-Z]+)((?: "[^"]+")*)
(demo):
([A-Z]+)
in first group, you get the command((?: "[^"]+")*)
in the second group, you'll get arguments in quotes, separated by spacesThen you can use FindAllString
with "([^"]+)"
to extract arguments (demo).
I think this may solve your problem
re1, _ := regexp.Compile(`([A-Z]+)(?: *)`)
commandText:=`COPY "filename one" "filename two"`
if re1.Match([]byte(commandText)){
index:=re1.FindIndex([]byte(commandText))[1]
commandArgs:=commandText[index:]
commandArgsRegex,_:=regexp.Compile(`"([^"]+)"`)
fmt.Println("Command= " , commandText[0:index])
for i,arg:=range commandArgsRegex.FindAllString(commandArgs,-1){
fmt.Println("args ", i,"= " , arg)
}
}else{
fmt.Println("Failed")
}
Add an extra capture group. If you make it optional extra data will be empty but the match will work
re1, _ := regexp.Compile(`^([A-Z]+)(\s"[^"]+")(\s"[^"]+")?(\s"[^"]+")?$`)
Add more (\s"[^"]+")?
expressions up to the maximum you need. I put in two as there is an expression with 3 parameters in your examples