在href链接上的HTMLPurifier问题与数字一起盯着

I came across an issue with the PHP HTMLPurifier library. The issue I was facing was with the below input string

<a href="http://1plusone/com/Update">Update</a>

For the above input, I was getting the purified output as

<a href="/com/Update">Update</a>

I went through their documentations. But I was not able to find a solution for the issue.

Source Code:

require_once("/html_purifier/library/HTMLPurifier.auto.php");
$config = HTMLPurifier_Config::createDefault();
$text= "<a href=\"http://1plusone/com/Update\">Update</a>";
$oPurifier = new HTMLPurifier($config);
$purifiedHtml= $oPurifier->purify($text);
echo $purifiedHtml;

I have also tried this live demo of htmlpurifier. It was also giving the same result.

Please help.

It appears that HTML Purifier rejects host names with only one part that have a leading digit. The relevant code is in HTMLPurifier/AttrDef/URI/Host.php:

    // The productions describing this are:
    $a   = '[a-z]';     // alpha
    $an  = '[a-z0-9]';  // alphanum
    $and = "[a-z0-9-$underscore]"; // alphanum | "-"
    // domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum
    $domainlabel = "$an($and*$an)?";
    // toplabel    = alpha | alpha *( alphanum | "-" ) alphanum
    $toplabel = "$a($and*$an)?";
    // hostname    = *( domainlabel "." ) toplabel [ "." ]
    if (preg_match("/^($domainlabel\.)*$toplabel\.?$/i", $string)) {
        return $string;
    }

A simple fix would probably to patch this to be more permissive. I don't know if there is a more recent RFC that allows what you describe.