According to http://php.net/htmlspecialchars the default value for the charset argument changed from ISO-8859-1 to UTF8 in PHP 5.4.
If you follow the discussion of this bug https://bugs.php.net/bug.php?id=61354 here, you quickly realize that this lead to major difficulties in maintaining legacy PHP-Code. We've run into similar problems.
Explanations like "most people use UTF-8" in this post http://nikic.github.io/2012/01/28/htmlspecialchars-improvements-in-PHP-5-4.html are really weird.
As far as I know, htmlspecialchars() quotes everything all right in UTF8, even if ISO-8859-1 is set. People using non-ASCII-charsets had to set the parameter... ok, but they are not affected by a change to the default behavior, either. I've written a lot of UTF-8 code and never had a problem with htmlspecialchars() using ISO-8859-1 before.
So why change this behavior? Or am I overlooking some security issues? I just want to understand it (no code solution needed!).
The behaviour is changed from ISO-8859-1 as a default to UTF-8 as a default because UTF-8 is more commonly used.
ISO-8859-1 covers Latin characters while UTF-8 supports a much larger set of characters. This is why UTF-8 is often used in favor of ISO-8859-1.
As for why they chose to break compatibility: I guess they thought it was a good idea. Maybe the underestimated the impact this would have. I can see this being a bit of a snag for entry-level developers.
This is a potential fix for this breaking change:
function myhtmlspecialchars($string, $flags = null, $encoding = "ISO-8859-1", $double_encode = true) {
if ($flags === null) { $flags = ENT_COMPAT | ENT_HTML401; }
return htmlspecialchars($string, $flags, $encoding, $double_encode);
}
And then simply replace htmlspecialchars
by myhtmlspecialchars
in your code.
As of version 5.6.0 the default is default_charset
. So this only impacts applications running on a version between 5.4.x
and 5.5.x