出于什么原因,htmlspecialchar()默认字符集从ISO-8859-1更改为UTF8

According to http://php.net/htmlspecialchars the default value for the charset argument changed from ISO-8859-1 to UTF8 in PHP 5.4.

If you follow the discussion of this bug https://bugs.php.net/bug.php?id=61354 here, you quickly realize that this lead to major difficulties in maintaining legacy PHP-Code. We've run into similar problems.

Explanations like "most people use UTF-8" in this post http://nikic.github.io/2012/01/28/htmlspecialchars-improvements-in-PHP-5-4.html are really weird.

As far as I know, htmlspecialchars() quotes everything all right in UTF8, even if ISO-8859-1 is set. People using non-ASCII-charsets had to set the parameter... ok, but they are not affected by a change to the default behavior, either. I've written a lot of UTF-8 code and never had a problem with htmlspecialchars() using ISO-8859-1 before.

So why change this behavior? Or am I overlooking some security issues? I just want to understand it (no code solution needed!).

The behaviour is changed from ISO-8859-1 as a default to UTF-8 as a default because UTF-8 is more commonly used.

ISO-8859-1 covers Latin characters while UTF-8 supports a much larger set of characters. This is why UTF-8 is often used in favor of ISO-8859-1.


As for why they chose to break compatibility: I guess they thought it was a good idea. Maybe the underestimated the impact this would have. I can see this being a bit of a snag for entry-level developers.

This is a potential fix for this breaking change:

function myhtmlspecialchars($string, $flags = null, $encoding = "ISO-8859-1", $double_encode = true) {
    if ($flags === null) { $flags = ENT_COMPAT | ENT_HTML401; }
    return htmlspecialchars($string, $flags, $encoding, $double_encode);
}

And then simply replace htmlspecialchars by myhtmlspecialchars in your code.


As of version 5.6.0 the default is default_charset. So this only impacts applications running on a version between 5.4.x and 5.5.x