I have a string , in my sql database that has come from user.
$str ='<h2 contenteditable="true">I am a not a good user <script>alert("hacked") </script> </h2>';
if I echo it as it is then it is not good So I use htmlspecialchars(); to escape the special html chracters
echo htmlspecialchars($str);
This will save me from hacking , but i want to keep other tags (like <h2> ) as it is , i don't want it to change , is their a way if i could only escape specific tag using htmlspecialchars();
I think strip_tags()
is what you are looking for. You can add allowed tags to the second parameter
Check out this function from the PHP Docs
$strippedinput = strip_tags_attributes($nonverifiedinput,"<p><br><h1><h2><h3><a><img>","class,style");
function strip_tags_attributes($string,$allowtags=NULL,$allowattributes=NULL){
$string = strip_tags($string,$allowtags);
if (!is_null($allowattributes)) {
if(!is_array($allowattributes)) $allowattributes = explode(",",$allowattributes);
if(is_array($allowattributes)) $allowattributes = implode(")(?<!",$allowattributes);
if (strlen($allowattributes) > 0) $allowattributes = "(?<!".$allowattributes.")";
$string = preg_replace_callback("/<[^>]*>/i",create_function( '$matches', 'return preg_replace("/ [^ =]*'.$allowattributes.'=(\"[^\"]*\"|\'[^\']*\')/i", "", $matches[0]);' ),$string);
}
return $string;
}
As Gerrit0 pointed out, you shouldn't use regex to parse HTML
I was about to propose something very basic with regular expressions but I found this here:
https://stackoverflow.com/a/7131156/6219628
After reading more of the docs, I didn't found anything to ignore specific tags with just htmlspecialchars(), which doesn't sound surprising.
EDIT: And since using regex to parse html seems to be evil, you may eventually appreciate reading this bulky answer :) https://stackoverflow.com/a/1732454/6219628
Note that just removing the <script>
tag isn't sufficient; there are many other ways that users can inject malicious content into your site.
If you want to restrict the HTML tags that users can input, use a tool like HTML Purifier which uses a whitelist of allowable tags and attributes.