PHP / HTML评论标签

I have several HTML pages with codes that look like this:

<!-- ID: 123456 -->

What I need is a PHP script that can pull that ID number. I have tried the following:

if (preg_match('#^<!--(.*?)-->#i', $output)) {
                echo "A match was found.";
            } else {
                echo array_flip(get_defined_constants(true)['pcre'])[preg_last_error()];
                echo "No match found.";
            }`

That always gives "No match found", with no error reported. I have also tried the preg_match_all and the same results. The only thing I have found to work is to create an array based on spaces, but that is very time consuming and waste of processor power.

For reference, I have looked and tried just about every suggestion on these pages:

Explode string by one or more spaces or tabs

http://php.net/manual/en/function.preg-split.php

How to extract html comments and all html contained by node?

How about try this:

<!-- ID: ([\w ]+) -->

This will search for all the literals mentioned in your example, and extract the numeric ID. You can fetch it with the help of numbered group.

PS:Use the escaping.

First think the HTML file as a Text file because you want to read only some text from the .html file.

test.html

<!DOCTYPE html>
<html>
<head>
    <title></title>
</head>
<body>
<p>This is a test HTML page<p>
<!-- ID: 123456 -->
</body>
</html>

PHP script that fetch ID from HTML file

<?php

$fileName = 'test.html';

$content = file_get_contents($fileName);
$start = '<!-- ID:';
$end   = '-->';
function getBetween($content,$start,$end){
    $r = explode($start, $content);

    if (isset($r[1])){

        $r = explode($end, $r[1]);
        return $r[0];

    }
    return '';
}


echo str_replace(' ', '', getBetween($content,$start,$end));


?>

To extract informations from structured data (as HTML, XML, Json...) use the correct parser (DOMDocument and DOMXPath to query the DOM tree):

$html = <<<'EOD'
<script>var a='<!-- ID: avoid_this --> and that <!-- ID: 666 -->';</script>
blahblah<!-- ID: 123456 -->blahblah
EOD;

$query = '//comment()[starts-with(., " ID: ")]';

$dom = new DOMDocument;
$dom->loadHTML($html);
$xp = new DOMXPath($dom);

$nodeList = $xp->query($query);

foreach ($nodeList as $node) {
    echo substr($node->textContent, 5, -1);
}

Feel free to check the result after with is_numeric or a regex. You can register your own php function and include it in the xpath query too: http://php.net/manual/en/domxpath.registerphpfunctions.php