I'm looking for a way to match part of - or the whole - previously matched group. For instance, assume we've the following text:
this is a very long text "with" some quoted strings I "need" to match in their own context
A regex like (.{1,20})(".*?")(.{1,20})
gives the following output:
# | 1st group | 2nd group | 3rd group
------------------------------------------------------------------
1 | is a very long text | "with" | some quoted strings
2 | I | "need" | to extract in their
The goal's to force the regex to re-match part of the 3rd group from the 1st match - or the whole match when quoted strings are quite near - when is matching the 2nd one. Basically I'd like to have the following output instead:
# | 1st group | 2nd group | 3rd group
------------------------------------------------------------------
1 | is a very long text | "with" | some quoted strings
2 | me quoted strings I | "need" | to extract in their
Probably, a backreference support would do the trick but go regex engine lacks of it.
If you go back to the original problem, you need to extract the quotes in context.
Since you don't have lookahead, you could use regexp just to match quotes (or even just strings.Index), and just get byte ranges, then expand to include context yourself by expanding the range (this may require more work if dealing with complex UTF strings).
Something like:
input := `this is a very long text "with" some quoted strings I "need" to extract in their own context`
re := regexp.MustCompile(`(".*?")`)
matches := re.FindAllStringIndex(input, -1)
for _, m := range matches {
s := m[0] - 20
e := m[1] + 20
if s < 0 {
s = 0
}
if e >= len(input) {
e = -1
}
fmt.Printf("%s
", input[s:e])
}