I am trying to parse multiple tags in one string literal. such as name=testName, key=testKey, columns=(c1, c2, c3)
, and I might consider add more tags with different syntax in this string in the near future. So it's natural to study regex to implement it.
as for the syntax: valid:
`name=testName,key=testKey`
`name=testName, key=testKey`
`name=testName key=testKey`
`name=testName key=testKey`
`name=testName key=testKey columns=(c1 c2 c3)`
`name=testName key=testKey columns=(c1, c2, c3)`
`name=testName, key=testKey, columns=(c1 c2 c3)`
invalid:
`name=testName,, key=testKey` (multiple commas in between)
`name=testName, key=testKey,` (end with a comma)
`name=testName, key=testKey, columns=(c1,c2 c3)` u can only use comma or whitespace consistently inside columns, the rule applies to the whole tags as well. see below
`name=testName, key=testKey columns=(c1,c2,c3)`
I come up the whole pattern like this:
((name=\w+|key=\w+)+,\s*)*(name=\w+|key=\w+)+
I am wondering is it possible to set the subpattern as a regex and then combine them into a larger pattern. such as
patternName := regexp.MustCompile(`name=\w+`)
patternKey := regexp.MustCompile(`key=\w+`)
pattern = ((patternName|patternKey)+,\s*)*(patternName|patternKey)+
considering I will add more tags, the whole pattern will definitely get larger and more ugly. Is there any elegant way like the combined way?
Yes, what you want is possible. the regexp.Regexp
type has a String()
method, which produces the string representation. So you can use this to combine regular expressions:
patternName := regexp.MustCompile(`name=\w+`)
patternKey := regexp.MustCompile(`key=\w+`)
pattern = regexp.MustCompile(`((`+patternName.String()+`|`+patternKey.String()+`)+,\s*)*(`+patternName.String()+`|`+patternKey.String()`+`)+`)
Can be shortened (though less efficient) with fmt.Sprintf
:
pattern = regexp.MustCompile(fmt.Sprintf(`((%s|%s)+,\s*)*(%s|%s)+`, patternName, patternKey, patternName, patternKey)
But just because it's possible doesn't mean you should do it...
Your particular examples would be much more easily handled using standard text parsing methods such as strings.Split
or strings.FieldsFunc
, etc. Given your provided sample inputs, I would do it this way:
name
and/or key
)This code will be far easier to read, and will execute probably hundreds or thousands of times faster, compared to a regular expression. This approach also lends itself easily to stream processing, which can be a big benefit if you're processing hundreds or more records, and don't want to consume a lot of memory. (Regexp can be made to do this as well, but it's still less readable).