Given the two strings a = "/some/{tag}/here"
and b = "/some/text/here"
I would like an efficient algorithm to verify if b
matches the pattern defined by a
and if it does to extract the corresponding part of b
in a variable (i.e.:tag = "text"
).
Implementations in C or Go are also welcome but pseudocode will do just fine.
Read about the Knuth–Morris–Pratt string searching algorithm. Should give you all you need including pseudo code.
Many good regex toolkits can do this, but you might have to change the syntax of patterns. E.g., here's the Python version:
>>> import re
>>> a = re.compile("/some/(?P<pattern>.+)/here")
>>> b = "/some/text/here"
>>> a.match(b).group("pattern")
'text'
Maybe you could split a
string[] array1 = a.Split('/');
string[] array2 = a.Split('/');
bool isEqual = (array1[2] == array2[2]);
Go answer: The Go standard library has a URL parser and regular expression package to help you. Go does not let you name variables at runtime, so getting your answer as tag = "text"
doesn't quite make sense. Instead you might want to return a result as a struct, or perhaps collect multiple results in a map. An outline might go something like,
Code showing construction of regular expressions:
package main
import (
"fmt"
"regexp"
)
var a = "/some/{tag}/here/{and}/there"
var aPath = `/some/bread/here/jam/there`
func main() {
tagPat := regexp.MustCompile("([^{]*){([^}]+)}")
aMatch := tagPat.FindAllStringSubmatch(a, -1)
if aMatch == nil {
fmt.Println("bad pattern")
return
}
aRE := ""
matchLen := 0
for _, m := range aMatch {
if m[1] > "" {
aRE += `\Q` + m[1] + `\E`
}
aRE += "(?P<" + m[2] + ">.*)"
matchLen += len(m[0])
}
if matchLen < len(a) {
aRE += `\Q` + a[matchLen:] + `\E`
}
aPat := regexp.MustCompile(aRE)
pathMatch := aPat.FindStringSubmatch(aPath)
if pathMatch == nil {
fmt.Println("url doesn't match")
return
}
for tx, tag := range aPat.SubexpNames()[1:] {
fmt.Println(tag, "=", pathMatch[tx+1])
}
}
Output:
tag = bread
and = jam
So you have a pattern string of the form /some/{tag}/here
, and you want to determine if some other string matches that pattern. If it does, then you want to extract the {tag}
portion.
Seems to me that you could split your pattern string into three parts:
"/some/"
"{tag}"
"/here"
Now, using standard C comparison functions (I'm thinking something like strncmp
), check to see if the string starts with "/some/"
and ends with "/here"
. If it does, then you can easily find the beginning and end of the tag string. The beginning is:
stringBegin = s + strlen("/some/");
length = strlen(s) - strlen("/some/") - strlen("/here");
Then it's a simple matter of copying out that substring.
Of course my example is using constant strings. But if you can easily split out the components, then you can substitute variables for the constants.
I'm assuming your tags can't have slashes in them. If that is not so my solution won't work without considerable modification.
If the above holds true though then you can first tokenize your path into a list like user1288160 shows in his answser. My solution will be in go.
path := strings.Split(url, "/")
Then you can use a simple state machine to process the tokens.
type urlParser func([]string) (urlParser, []string, error)
// define handlers for the various tokens that do appropriate things
var parseMap map[string]urlParser
var startParse = func(ps []string) (urlParser, []string, error) {
switch {
case len(ps) == 0:
return nil, nil, errors.New("End Of Path")
case len(ps) == 1:
return parseMap[ps[0]], nil, nil
case len(ps) > 1:
return parseMap[ps[0]], ps[1:], nil
}
}
p := startParse
var err error
for {
// get the next step in the state machine, unparsed portion of the path
// and any errors.
next, rst, pErr := p(path)
// an error means we are done.
if pErr != nil {
break;
}
// set up for our next iteration of the parse loop.
p = next
path = rst
err = pErr
}
Your urlParsers will be closures that populate some variable with whatever you matched against.
For we can help it,we need background information. For example, what compose the "pattern", numbers? letters? number and letters? which characters are allowed?
First scenery: Assuming that the position of path target is fix, you can do something like this:
C code:
char * string = "/some/text/here";
char * path;
char * b = "text";
if(strtok(strdup(string), "/")) {
path = strtok(NULL, "/");
if(!strcmp(b, path)) {
/* Are equals. Do something.. */
} else {
/* ... */
}
} else {
printf("Not found tag.
");
}
Second scenery:
Assuming that the you know only the predecessor of path target, you can do something like this:
C code:
char * string = "/some/text/here";
char *cpath, /* Current path */
*ppath = NULL, /* Predecessor path */
*ptpath = "some", /* Predecessor path target */
*pathcmp = "text"; /* Path to compare */
cpath = strtok(strdup(string), "/");
while(cpath) {
ppath = cpath;
cpath = strtok(NULL, "/");
if(ppath && ptpath && !strcmp(ppath, ptpath)) {
if(!strcmp(cpath, pathcmp)) {
/* Are equals. */
} else {
/* ... */
}
break;
}
}
Very simple cases like this, where can escape from regular expression and URI parsing(on good sense, of course).
I hope this help you.