需要解析链接的HTML文档 - 使用像html5lib或其他类似的库?

I'm a very newbie webpage builder, currently working on creating a website that needs to change link colours according to the destination page. The links will be sorted into different classes (e.g. good, bad, neutral) by certain user input criteria-- e.g. links with content the user would find of interest is colored blue, stuff that the user (presumably) doesn't want to see is colored as normal text, etc.

I reckon I need a way to parse the webpage for links to the content (stored in MySQL database), change the colors for all the links on the page (so I need to be able to change the link classes in the HTML as well) before outputting the adapted page to the user. I read that regex is not a good way to find those links-- so should I use a library, and if so, is html5lib good for what I'm doing?

There's no need to complicate urself with PHP HTML parsers which will mangle and forcefully "repair" your input HTML.

Here's how you can combine PHP with javascript, complete working and tested solution:

<?php
$arrBadLinks=array(
    "http://localhost/something.png",
    "https://www.apple.com/something.png",
);
$arrNeutralLinks=array(
    "http://www.microsoft.com/index.aspx",
    "ftp://samewebsiteasyours.com",
    "ftp://samewebsiteasyours.net/file.txt",
);
?>
<html>
    <head>
        <script>
        function colorizeLinks()
        {
            var arrBadLinks=<?php echo json_encode($arrBadLinks);?>;
            var arrNeutralLinks=<?php echo json_encode($arrNeutralLinks);?>;

            var nodeList=document.getElementsByTagName("*");
            for(var n=nodeList.length-1; n>0; n--)
            {
                var el=nodeList[n];

                if(el.nodeName=="A")
                {
                    if(arrBadLinks.indexOf(el.href)>-1)
                        el.style.color="red";
                    else if(arrNeutralLinks.indexOf(el.href)>-1)
                        el.style.color="green";
                    else
                        el.style.color="blue";
                }
            }
        }

        if(window.addEventListener)
            window.addEventListener("load", colorizeLinks, false);
        else if (window.attachEvent)
            window.attachEvent("onload", colorizeLinks);
        </script>
    </head>
    <body>
        <p>
            <a href="http://www.microsoft.com/index.aspx">Neutral www.microsoft.com/index.aspx</a>
        </p>
        <p>
            <a href="http://localhost/something.png">Bad http://localhost/something.png</a>
        </p>
    </body>
</html>

Does not work for relative URLs, make sure you make them absolute, or the comparison will fail (or update the code to fill in the http://current-domain.xxx for the existing relative URL).