I was looking everywhere, there is a ton of regexes for http:// or www... URL but nothing for links used inside a server.
In my case I need to sanitize/validate a path like:
/folder1/folder2/.../file.ext
so that e.g.:
/img/<"?">/
/img/content/.../file.ext<script>alert("Script")</script>
is not valid. Which means valid is a path starting with "/", followed by combinations of folder valid name and "/", ending with filename and an extension.
PHP's Built-in FILTER_VALIDATE_URL or FILTER_SANITIZE_URL do not accept such a path as a valid URL. So I guess I have to use regex.
filter_var($url, FILTER_VALIDATE_REGEXP, array("options"=>array("regexp"=>"/ ... /"))))
Please can somebody of the regex wizards help me with this. Thank you.
Using Regular Expressions to test things like URIs, e-mail addresses and other such complex identifiers is a hairy proposition: It requires you to read and fully understand the specs, for all possible involved systems; Keep on track of any changes to all said systems; And update your code for as long as it's available on the net.
In short: It's a huge investment, and it's critical you follow up on it.
That said, you can use FILTER_VALIDATE_URL
on this, by simply adding the file:
protocol prefix to the URL.
php > $st = "file://home/test";
php > var_dump (filter_var ($st, FILTER_VALIDATE_URL));
string(16) "file://home/test"
php > $st2 = "/home/test";
php > var_dump (filter_var ($st2, FILTER_VALIDATE_URL));
bool(false)
Once that is done you know that the string given adheres to a legal URL schema, which describes a local file resource. If it doesn't then you can tell the user that the path given is not a valid path.
Then you can check to see if the path is an existing one, and one the user has access to. You might also want to add further restrictions on the pathname, to avoid unknown problems with special characters. In any case, always use the appropriate methods for output escaping to the target system.
As for your supposedly invalid path name:
tmp$ touch '<"?">'
tmp$ ls -l
total 0
-rw-rw-r-- 1 christian christian 0 Oct 25 15:52 <"?">
A good example on why one should always use output escaping, btw.