Possible Duplicate:
Sanitization of User-Supplied Regular Expressions in PHP
Let's say you want to let users search for something and your search function has the ability to accept regular expressions.
Is it OK to let site users to search by regexes that they post? From a user's point of view, I'd love a site which would let me do that :D
Is there any security risk involved? How can I sanitize a regex?
The main risk is that the regular expression is very complex and will run for ages or reach the recursion limit of the engine. See this article. Other risks may occur if you let your users user regex replacement in the wrong places, because that introduces the risk of code injection. But matching itself cannot really do any other harm than DoSing your server.
There has been a question recently on how to recognize these dangerous regexes and the consensus was that it is not generally possible. See the question.
You are probably best off by restricting the time your regex search can take and abort it if it takes too long.
If the regex doesn't effect the programming code, there's no real security risk. The reason, I believe, that it's often not implemented is that it is a costly procedure and I have never seen it used in SQL, so you would need to get ALL the content being searched through, and then run the regex on it, rather than the simplicity allowed with the SQL like
or exact matching, etc.
I don't see a direct security risk, but I see performance related issues that can easily cause some serious downtime. There's two flavors of this, too complex ones and too broad ones. Consider for example a query like .*
- with a big database, I've seen that even a couple of those can easily bring down systems.
I would execute user searches with something else than the actual live database, preferrably from cached results in memory, where this should not matter as much.
Or just implement only wildcards like suggested in the comments (*,?). They're both more user friendly and easier to deal with.