拆分字符串时,如何在输出中包括运算符?

Yesterday I asked this question about splitting a string in python. I've since decided to do this project in Go instead. I have the following:

input := "house-width + 3 - y ^ (5 * house length)"
s := regexp.MustCompile(" ([+-/*^]) ").Split(input, -1)
log.Println(s)  //  [house-width 3 y (5 house length)]

How do I include the operators in this output? e.g. I'd like the following output:

['house-width', '+', '3', '-', 'y', '^', '(5', '*', 'house length)']

EDIT: To clarify I am splitting on the space-separated operators and not just the operator. The operator must have a space on both ends to differentiate it from a dash/hyphen. Please refer to my original python question I linked to for clarification if needed.

You can get the operands of your expression using regexp.Split() (just as you did) and you can get the operators (the separators) using regexp.FindAllString().

By doing this you will have 2 separate []string slices, you can merge these 2 slices if you want the result in one []string slice.

input := "house-width + 3 - y ^ (5 * house length)"

r := regexp.MustCompile(`\s([+\-/*^])\s`)

s1 := r.Split(input, -1)
s2 := r.FindAllString(input, -1)

fmt.Printf("%q
", s1)
fmt.Printf("%q
", s2)

all := make([]string, len(s1)+len(s2))
for i := range s1 {
    all[i*2] = s1[i]
    if i < len(s2) {
        all[i*2+1] = s2[i]
    }
}
fmt.Printf("%q
", all)

Output (try it on the Go Playground):

["house-width" "3" "y" "(5" "house length)"]
[" + " " - " " ^ " " * "]
["house-width" " + " "3" " - " "y" " ^ " "(5" " * " "house length)"]

Note:

If you want to trim the spaces from the operators, you can use the strings.TrimSpace() function for that:

for i, v := range s2 {
    all[i*2+1] = strings.TrimSpace(v)
}
fmt.Printf("%q
", all)

Output:

["house-width" "+" "3" "-" "y" "^" "(5" "*" "house length)"]

If you're planing to parse the expression afterwards you'll have to make some changes:

  • Include parentheses as lexemes
  • You can't have both spaces and dashes be valid identifier characters because e.g. - y inbetween 3 and ^ would be a valid identifier.

After that's done, you can use use a simple linear iteration to lex your string:

package main

import (
    "bytes"
    "fmt"
)

func main() {

    input := `house width + 3 - y ^ (5 * house length)`
    buffr := bytes.NewBuffer(nil)
    outpt := make([]string, 0)

    for _, r := range input {
        if r == '+' || r == '-' || r == '*' || r == '/' || r == '^' || r == '(' || r == ')' || (r >= '0' && r <= '9') {
            bs := bytes.TrimSpace(buffr.Bytes())
            if len(bs) > 0 {
                outpt = append(outpt, (string)(bs))
            }
            outpt = append(outpt, (string)(r))
            buffr.Reset()
        } else {
            buffr.WriteRune(r)
        }
    }

    fmt.Printf("%#v
", outpt)

}

Once lexed, use Dijkstra's shunting-yard algorithm to build an AST or directly evaluate the expression.

I think FindAll() may be the way to go.

Expanded regex:

 \s*                           # Trim preceding whitespace
 (                             # (1 start), Operator/Non-Operator chars
      (?:                           # Cluster group
           \w -                          # word dash
        |  - \w                          # or, dash word
        |  [^+\-/*^]                     # or, a non-operator char
      )+                            # End cluster, do 1 to many times
   |                              # or,
      [+\-/*^]                      # A single simple math operator
 )                             # (1 end)
 \s*                           # Trim trailing whitespace

Go code snippet:

http://play.golang.org/p/bHZ21B6Tzi

package main

import (
"log"
"regexp"
)
func main() {
    in := []byte("house-width + 3 - y ^ (5 * house length)")
    rr := regexp.MustCompile("\\s*((?:\\w-|-\\w|[^+\\-/*^])+|[+\\-/*^])\\s*")
    s := r.FindAll( in, -1 )
    for _, ss:=range s{
      log.Println(string(ss))
    }
}

Output:

 house-width 
 + 
 3 
 - 
 y 
 ^ 
 (5 
 * 
 house length)