什么是进行URL匹配和标签提取的有效方法？

Given the two strings a = "/some/{tag}/here" and b = "/some/text/here" I would like an efficient algorithm to verify if b matches the pattern defined by a and if it does to extract the corresponding part of b in a variable (i.e.:tag = "text").

Implementations in C or Go are also welcome but pseudocode will do just fine.

Read about the Knuth–Morris–Pratt string searching algorithm. Should give you all you need including pseudo code.

Many good regex toolkits can do this, but you might have to change the syntax of patterns. E.g., here's the Python version:

>>> import re
>>> a = re.compile("/some/(?P<pattern>.+)/here")
>>> b = "/some/text/here"
>>> a.match(b).group("pattern")
'text'

Maybe you could split a

string[] array1 = a.Split('/');
string[] array2 = a.Split('/');
bool isEqual = (array1[2] == array2[2]);

Go answer: The Go standard library has a URL parser and regular expression package to help you. Go does not let you name variables at runtime, so getting your answer as tag = "text" doesn't quite make sense. Instead you might want to return a result as a struct, or perhaps collect multiple results in a map. An outline might go something like,

Compile a regexp that matches your tag syntax with the braces. You do this once when the program loads. Lets call this tagRE.
Apply tagRE to pattern "a". The results of this match will be the parts of the URL to match, and the name of the tag. (If the match fails, pattern "a" is invalid.)
Use the results to construct and compile another regexp that matches that pattern in a real url. Let's call this aRE. Hold on to this regexp as long as you think you might need to match this pattern in the future. There's no sense in repeating the work of compiling it.
Maybe repeat steps 2 and 3 as needed for other patterns as needed, or maybe as patterns become available to your program. Maybe collect these in a slice or map or something. I'm guessing you will also want to associate these with something else useful in your application, like some code to execute when a match is found.
When you have a real url you want to match, You probably want to parse it first with the URL package to separate out the URL path.
Apply aRE (or all regexps in the slice) to the path and see if you have a match. If so, return a result containing the tag name from a and the part of the path that aRE matched. You do this by creating a result struct or adding to your result map.

Code showing construction of regular expressions:

package main

import (
    "fmt"
    "regexp"
)

var a = "/some/{tag}/here/{and}/there"
var aPath = `/some/bread/here/jam/there`

func main() {
    tagPat := regexp.MustCompile("([^{]*){([^}]+)}")
    aMatch := tagPat.FindAllStringSubmatch(a, -1)
    if aMatch == nil {
        fmt.Println("bad pattern")
        return
    }
    aRE := ""
    matchLen := 0
    for _, m := range aMatch {
        if m[1] > "" {
            aRE += `\Q` + m[1] + `\E`
        }
        aRE += "(?P<" + m[2] + ">.*)"
        matchLen += len(m[0])
    }
    if matchLen < len(a) {
        aRE += `\Q` + a[matchLen:] + `\E`
    }
    aPat := regexp.MustCompile(aRE)
    pathMatch := aPat.FindStringSubmatch(aPath)
    if pathMatch == nil {
        fmt.Println("url doesn't match")
        return
    }
    for tx, tag := range aPat.SubexpNames()[1:] {
        fmt.Println(tag, "=", pathMatch[tx+1])
    }
}

Output:

tag = bread
and = jam

So you have a pattern string of the form /some/{tag}/here, and you want to determine if some other string matches that pattern. If it does, then you want to extract the {tag} portion.

Seems to me that you could split your pattern string into three parts:

"/some/"
"{tag}"
"/here"

Now, using standard C comparison functions (I'm thinking something like strncmp), check to see if the string starts with "/some/" and ends with "/here". If it does, then you can easily find the beginning and end of the tag string. The beginning is:

stringBegin = s + strlen("/some/");
length = strlen(s) - strlen("/some/") - strlen("/here");

Then it's a simple matter of copying out that substring.

Of course my example is using constant strings. But if you can easily split out the components, then you can substitute variables for the constants.

I'm assuming your tags can't have slashes in them. If that is not so my solution won't work without considerable modification.

If the above holds true though then you can first tokenize your path into a list like user1288160 shows in his answser. My solution will be in go.

path := strings.Split(url, "/")

Then you can use a simple state machine to process the tokens.

type urlParser func([]string) (urlParser, []string, error)

// define handlers for the various tokens that do appropriate things
var parseMap map[string]urlParser

var startParse = func(ps []string) (urlParser, []string, error) {
   switch  {
   case len(ps) == 0:
      return nil, nil, errors.New("End Of Path")
   case len(ps) == 1:
     return parseMap[ps[0]], nil, nil
   case len(ps) > 1:
     return parseMap[ps[0]], ps[1:], nil
   }
}

p := startParse
var err error
for {
   // get the next step in the state machine, unparsed portion of the path
   // and any errors.
   next, rst, pErr := p(path)
   // an error means we are done.
   if pErr != nil {
     break;
   }
   // set up for our next iteration of the parse loop.
   p = next
   path = rst
   err = pErr
}

Your urlParsers will be closures that populate some variable with whatever you matched against.

For we can help it,we need background information. For example, what compose the "pattern", numbers? letters? number and letters? which characters are allowed?

First scenery: Assuming that the position of path target is fix, you can do something like this:

C code:

char * string = "/some/text/here";
char * path;
char * b = "text";

if(strtok(strdup(string), "/")) {
    path = strtok(NULL, "/");
    if(!strcmp(b, path)) {
        /* Are equals. Do something.. */
    } else {
        /* ... */
    }
} else { 
    printf("Not found tag.
");
}

Second scenery:

Assuming that the you know only the predecessor of path target, you can do something like this:

C code:

char * string = "/some/text/here";

char *cpath,            /* Current path */ 
     *ppath   = NULL,   /* Predecessor path */
     *ptpath  = "some", /* Predecessor path target */
     *pathcmp = "text"; /* Path to compare */ 

cpath = strtok(strdup(string), "/");

 while(cpath) { 
    ppath = cpath; 
    cpath = strtok(NULL, "/");

    if(ppath && ptpath && !strcmp(ppath, ptpath)) {
        if(!strcmp(cpath, pathcmp)) {
            /* Are equals. */
        } else {
            /* ... */
        }

        break;
    }
}

Very simple cases like this, where can escape from regular expression and URI parsing(on good sense, of course).

I hope this help you.