I'm trying to validate a URL using Go's standard library. This is what my code currently looks like.
import (
"fmt"
"net/url"
)
func isValidURL(tocheck string) bool {
_, err := url.ParseRequestURI(tocheck)
return err == nil
}
func main() {
fmt.Println(isValidURL("google.com")) //returns false, expected true
fmt.Println(isValidURL("www.google.com")) //returns false, expected true
fmt.Println(isValidURL("google")) //returns false, expected false
fmt.Println(isValidURL("/google")) //returns true, expected false
}
All three examples print false, even though the first two should be true. I then tried appending https://
to the beginning of URLs that don't start with them, but then everything, like https://aaaa
is parsed as valid. What can I do to make sure it only returns true when the URL is actually valid?
You have confused domains with URLs, the domain is only a part of the URL.
Valid domain examples are: www.google.com
, localhost
and a.b.c.a.google.com
.
In order for a URL to be valid the scheme/protocol part (normally https://
) must be there, see syntax at Wikipedia for an easy explanation.
http://aaa
is a valid URL by the same rules as http://localhost
is valid
Most of those are domain names. https://aaaa
is a valid URL. /google
is not a URL, but it is acceptable to ParseRequestURI
because it also accepts absolute paths.
"rawurl is interpreted only as an absolute URI or an absolute path"
When you ask ParseRequestURI
you're asking for a strict syntax check of either an absolute URL or an absolute path. An absolute path is anything like /foo/bar
. What is and is not an absolute URL is covered by RFC 3986. The basic grammar for a URI is this.
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
hier-part = "//" authority path-abempty
/ path-absolute
/ path-rootless
/ path-empty
An "absolute URL" means the path
part is an absolute path or empty, so path-abempty
or path-absolute
above. http
and https
URLs can only be absolute. foo:bar/baz
is an example of a relative URL.
And here's an example.
foo://example.com:8042/over/there?name=ferret#nose
\_/ \______________/\_________/ \_________/ \__/
| | | | |
scheme authority path query fragment
| _____________________|__
/ \ / \
urn:example:animal:ferret:nose
google.com
has no scheme
, so it is not a URL. https://aaaa
has a scheme, https
and a hier-part
, //aaaa
, so it is a URL. It doesn't have a query
or fragement
, but they are optional.
Obviously this is a bit broad. In the real world you need to narrow down your requirements. Usually it's something like...
ParseRequestURI
because it can also be an absolute path.url.Scheme
. This will discard absolute paths.url.Host
.And any other checks you might want to do to restrict what you consider to be a valid URL.
So your full check might look like...
package main
import (
"fmt"
"net"
"net/url"
"errors"
)
func isValidURL(tocheck string) (bool, error) {
// Check it's an Absolute URL or absolute path
uri, err := url.ParseRequestURI(tocheck)
if err != nil {
return false, err
}
// Check it's an acceptable scheme
switch uri.Scheme {
case "http":
case "https":
default:
return false, errors.New("Invalid scheme")
}
// Check it's a valid domain name
_,err = net.LookupHost(uri.Host)
if err != nil {
return false, err
}
return true, nil
}
func main() {
// False, no scheme
fmt.Println(isValidURL("/google"))
// True, good scheme, good domain
fmt.Println(isValidURL("https://google.com"))
// False, bad domain
fmt.Println(isValidURL("http://halghalghlakdjfl.blarg"))
}