I have a string that follows a specific pattern like so operator(field,value)
and I'd like to use regex to extract out all three of operator, field and value. I'm struggling to come up with the syntax for how to capture these. In this case value can be alphanumeric as well, for example
"contains(name, Joe)"
or "lt(quantity, 2.5)"
Use something like this to capture groups, you may want to limit the characters accepted with [], note the use of ` and the use of \ escaping for () within the regexp:
func main() {
re := regexp.MustCompile(`(.+)\((.+),\s?(.+)\)`)
for _, t := range tests {
fmt.Println("result", re.FindStringSubmatch(t))
}
}
https://play.golang.org/p/43YLTafgQt
output:
result [contains(field, value) contains field value]
result [contains(name, Joe) contains name Joe]
result [lt(quantity, 2.5) lt quantity 2.5]
result [plus(no,44) plus no 44]
Depending on how strict you want to be you could use [a-z]+ or similar instead of .+ to match only certain characters but if you are not worried about bogus values this would probably be fine.
I don't know golang, but I do know regex's, so I'll do what I can here.
You probably want a group each for the "operator", "field", and "value". I'm going to assume for now that each of these can be represented as any combination of alphabetic, numeric, or underscore characters, with length of at least one character. In regex, we have a shortcut for that: \w
represents a single alpha-numeric or underscore character, and the +
modifier means "one or more". So \w+
means one or more such character in a row. If you want a more complex definition of what these fields can be named, I'll let you specify that in your question.
You say that you want to support "operator(field,value)". I'll start without whitespace anywhere, because it's simpler and you can easily remove all whitespace yourself before running the regex. We'll later add some whitespace support to the regex if you want it, but it'll make life difficult.
To do this, we want three groups, "1(2,3)" where 1 is the operator name, 2 is the field name, and 3 is the value name. Each of these, as given above, will be \w+
in our regex. We'll want to match the open and close parentheses as well as the comma, but we'll throw them away because they're really just delimiters. The parentheses will need to be escaped in the regex, since regex's have a special meaning for parentheses. The result looks like:
(\w+)\((\w+),(\w+)\)
\ 1 / \ 2 / \ 3 /
Where the second line shows you where the groups are each defined.
If you want to support some whitespace, you'll need to add \s*
in all such locations. This gets hairy, but you can do it as such:
(\w+)\s*\(\s*(\w+)\s*,\s*(\w+)\s*\)
\ 1 / \ 2 / \ 3 /
You give an example of wanting to support floating point values, and I presume other kinds of values too. You can accomplish this using the "or" pipe, |
. For example, group 3, instead of just being \w+
, could be defined as
[a-zA-Z_]\w*|\d+\.?|\d*\.\d+
This string will support alphanumeric+underscore strings where the first character must be alphabetic or underscore, OR integers, OR floating point (defined as an integer string with a period at the beginning, middle, or end). Clearly, this can go on and on to support more complex string values, but you get the idea.
So the final regex might look like:
(\w+)\s*\(\s*(\w+)\s*,\s*([a-zA-Z_]\w+|\d+\.?|\d*\.\d+)\s*\)
Sorry for not giving any golang help, I hope someone else can edit my answer and fill in that major gap.