正则表达式与golang中嵌套组的问题

Consider the following toy example. I want to match in Go a name with a regexp where the name is sequences of letters a separated by single #, so a#a#aaa is valid, but a# or a##a are not. I can code the regexp in the following two ways:

r1 := regexp.MustCompile(`^a+(#a+)*$`)
r2 := regexp.MustCompile(`^(a+#)*a+$`)

Both of these work. Now consider more complex task of matching a sequence of names separated by single slash. As in above, I can code that in two ways:

^N+(/N+)*$
^(N+/)*N+$

where N is a regexp for the name with ^ and $ stripped. As I have two cases for N, so now I can have 4 regexps:

    ^a+(#a+)*(/a+(#a+)*)*$
    ^(a+#)*a+(/a+(#a+)*)*$
    ^((a+#)*a+/)*a+(#a+)*$
    ^((a+#)*a+/)*(a+#)*a+$

The question is why when matching against the string "aa#a#a/a#a/a" the first one fails while the rest 3 cases work as expected? I.e. what causes the first regexp to mismatch? The full code is:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    str := "aa#a#a/a#a/a"
    regs := []string {
        `^a+(#a+)*(/a+(#a+)*)*$`,
        `^(a+#)*a+(/a+(#a+)*)*$`,
        `^((a+#)*a+/)*a+(#a+)*$`,
        `^((a+#)*a+/)*(a+#)*a+$`,
    }    
    for _, r := range(regs) {
        fmt.Println(regexp.MustCompile(r).MatchString(str))
    } 
}

Surprisingly it prints false true true true

You can try to alleviate atomic sub-nesting of quantifiers.
If you didn't have the anchors, the expression would possibly blow up
in backtracking if using nested optional quantifiers like this,
when it can't find a direct solution.

Go might be balking at that, forcing it to fail outright instead
of massive backtracking, but not sure.

Try something like this, see if it works.

 # ^a+(#?a)*(/a+(#?a)*)*$

 ^ 
 a+
 (                    # (1 start)
      \#?
      a 
 )*                   # (1 end)
 (                    # (2 start)
      /
      a+
      (                    # (3 start)
           \#?
           a 
      )*                   # (3 end)
 )*                   # (2 end)
 $

edit: (Transposed from comment) If complexity is too high, some engines will not even compile it, some will silently fail at runtime, some will lock up in a backtracking loop.

This little subtle difference is the problem

bad: too complex (#?a+)*
good: no nested, open ended quantifiers (#?a)*

If you ever have this problem, strip out the nesting, usually solves
the backtracking issue.


eit2: If you require a separator and want to insure a separator is only in the middle and surrounded by a's, you could try this

https://play.golang.org/p/oM6B6H3Kdx

 #  ^a+[#/](a[#/]?)*a+$

 ^ 
 a+
 [\#/] 
 (                             # (1 start)
      a
      [\#/]? 

 )*                            # (1 end)
 a+
 $ 

or this

https://play.golang.org/p/WihqSjH_dI

 # ^a+(?:[#/]a+)+$

 ^ 
 a+ 
 (?:
      [\#/] 
      a+
 )+
 $

This must be a bug in golang regexp engine. A simpler test case is that ^a(/a+(#a+)*)*$ fails to match a/a#a while ^(a)(/a+(#a+)*)*$ works, see http://play.golang.org/p/CDKxVeXW98 .

I filed https://github.com/golang/go/issues/11905