在GO中捕获重复组

I'm trying to create a function that can parse strings which consist of an uppercase word followed by zero or more arguments which are encapsulated in double quotes.

For example, each of the following lines:

COPY "filename one" "filename two"
REMOVE "filename"
LIST "x" "y" "z"
DISCONNECT

The result should be a string (the command) followed by a string[] (the arguments inside the quotes). I created the following regular expression:

re1, _ := regexp.Compile(`([A-Z]+)(?: "([^"]+)")*`)
results := re1.FindAllStringSubmatch(input, -1)

However, no-matter what I try, only the last argument gets captured.

An example of my problem: https://play.golang.org/p/W1rE1X4SWf5

"arg1" is not captured in this example. What am I missing?

If your commands are well defined, e.i. command names are always upper-case and arguments are always after the command then a looser regex might just fit your use case:

re1, _ := regexp.Compile(`([A-Z]+)|(?: "([^"]+)")`)
results := re1.FindAllStringSubmatch(`COMMAND "arg1" "arg2" "arg3"`, -1)

fmt.Println("Command:", results[0][1])
for _, arg := range results[1:] {
    fmt.Println("Arg:", arg[2])
}

Playground

When you try to capture repeated match, only the last one is captured. I'd try to do it in two steps: first split commmand and arguments, then parse the arguments.

Splitting to command and arguments can be done with ([A-Z]+)((?: "[^"]+")*) (demo):

  • ([A-Z]+) in first group, you get the command
  • ((?: "[^"]+")*) in the second group, you'll get arguments in quotes, separated by spaces

Then you can use FindAllString with "([^"]+)" to extract arguments (demo).

I think this may solve your problem

re1, _ := regexp.Compile(`([A-Z]+)(?: *)`)
commandText:=`COPY "filename one" "filename two"`
if re1.Match([]byte(commandText)){
    index:=re1.FindIndex([]byte(commandText))[1]
    commandArgs:=commandText[index:]
    commandArgsRegex,_:=regexp.Compile(`"([^"]+)"`)
    fmt.Println("Command= " , commandText[0:index])
    for i,arg:=range commandArgsRegex.FindAllString(commandArgs,-1){
        fmt.Println("args ", i,"= " , arg)
    }
}else{
    fmt.Println("Failed")
}

Add an extra capture group. If you make it optional extra data will be empty but the match will work

re1, _ := regexp.Compile(`^([A-Z]+)(\s"[^"]+")(\s"[^"]+")?(\s"[^"]+")?$`)

Add more (\s"[^"]+")? expressions up to the maximum you need. I put in two as there is an expression with 3 parameters in your examples