What is the best way to validate international domain names in Go?
https://golang.org/pkg/net/?m=all#isDomainName
https://golang.org/src/net/dnsclient.go?s=3444:3476#L109
Maybe copying this function which isn't exported from net package?
We need to validate domains like icaan.org and example.(special characters)
EDIT: IDN https://en.wikipedia.org/wiki/Internationalized_domain_name
Already using govalidator which fails to validate IDN's as it doesn't handle unicode characters https://github.com/asaskevich/govalidator/blob/master/validator.go
Here are some examples of IDN's
Just seen the reference to punycode.
Though all these punycode's are in the public suffix list https://publicsuffix.org/list/public_suffix_list.dat
:(
One possible approach is to use the builtin functions url.Parse(string)
and url.Hostname()
along with a regular expression to match sequences of Unicode letters/numbers/marks separated by dots (as represented in the sample data set).
For example:
var domainNamePattern = regexp.MustCompile(`^([\p{L}\p{M}\p{N}_%+-]+\.)+[\p{L}\p{M}\p{N}]+$`)
func main() {
ss := []string{
`https://evertpot.com/internationalized-domain-names-are-you-ready/`,
`http://bogus!.com`,
`https://foo1.bar2.com.gah.zip/`,
`http://مثال.إختبار`,
`http://例子.测试`,
`http://例子.測試`,
`http://παράδειγμα.δοκιμή`,
`http://उदाहरण.परीक्षा`,
`http://例え.テスト`,
`http://실례.테스트`,
`http://مثال.آزمایشی`,
`http://пример.испытание`,
}
for _, s := range ss {
u, err := url.Parse(s)
if err != nil || !domainNamePattern.MatchString(u.Hostname()) {
bogusPart := s
if err == nil {
bogusPart = u.Hostname()
}
fmt.Printf("ERROR: invalid URL or hostname %q
", bogusPart)
continue
}
fmt.Printf("OK: hostname=%q
", u.Hostname())
}
}
// OK: hostname="evertpot.com"
// ERROR: invalid URL or hostname "bogus!.com"
// OK: hostname="foo1.bar2.com.gah.zip"
// OK: hostname="مثال.إختبار"
// OK: hostname="例子.测试"
// OK: hostname="例子.測試"
// OK: hostname="παράδειγμα.δοκιμή"
// OK: hostname="उदाहरण.परीक्षा"
// OK: hostname="例え.テスト"
// OK: hostname="실례.테스트"
// OK: hostname="مثال.آزمایشی"
// OK: hostname="пример.испытание"
Of course, more care should be taken to build the regular expression in such a way that it conforms to any relevant specifications, but this example should be a good starting point.