I'm trying to write a regex in Go to test for Latin letters only.
I know that \p{Latin}
matches with any Latin script characters, but it also matches things such as Roman Numerals (e.g. "ⅻ"). That leads me to \p{L}
which matches Unicode letters, but it matches any script, not just Latin.
Best I've been able to come with so far is two regexes with an &&
:
latinRe := regexp.MustCompile(`\p{Latin}`)
letterRe := regexp.MustCompile(`\p{L}`)
if latinRe.Matches(testString) && letterRe.Matches(testString) {...}
I'm not happy that I can't test this as easily using something like regex101.com. Is there a better way? More succinct? Performant?
You can use a range like the following to specify all the characters you want to match. Depending on the regex engine, one of the following should work:
See regex in use here: Adapted from this link
[A-Za-z\u00C0-\u00D6\u00D8-\u00f6\u00f8-\u00ff]
[A-Za-z\xC0-\xD6\xD8-\xf6\xf8-\xff]
Another option is to negate specific characters from a Unicode character class:
[^\P{Latin}\p{N}]