I have several HTML pages with codes that look like this:
<!-- ID: 123456 -->
What I need is a PHP script that can pull that ID number. I have tried the following:
if (preg_match('#^<!--(.*?)-->#i', $output)) {
echo "A match was found.";
} else {
echo array_flip(get_defined_constants(true)['pcre'])[preg_last_error()];
echo "No match found.";
}`
That always gives "No match found", with no error reported. I have also tried the preg_match_all and the same results. The only thing I have found to work is to create an array based on spaces, but that is very time consuming and waste of processor power.
For reference, I have looked and tried just about every suggestion on these pages:
Explode string by one or more spaces or tabs
http://php.net/manual/en/function.preg-split.php
How to extract html comments and all html contained by node?
How about try this:
<!-- ID: ([\w ]+) -->
This will search for all the literals mentioned in your example, and extract the numeric ID. You can fetch it with the help of numbered group.
PS:Use the escaping.
First think the HTML file as a Text file because you want to read only some text from the .html file.
test.html
<!DOCTYPE html>
<html>
<head>
<title></title>
</head>
<body>
<p>This is a test HTML page<p>
<!-- ID: 123456 -->
</body>
</html>
PHP script that fetch ID from HTML file
<?php
$fileName = 'test.html';
$content = file_get_contents($fileName);
$start = '<!-- ID:';
$end = '-->';
function getBetween($content,$start,$end){
$r = explode($start, $content);
if (isset($r[1])){
$r = explode($end, $r[1]);
return $r[0];
}
return '';
}
echo str_replace(' ', '', getBetween($content,$start,$end));
?>
To extract informations from structured data (as HTML, XML, Json...) use the correct parser (DOMDocument and DOMXPath to query the DOM tree):
$html = <<<'EOD'
<script>var a='<!-- ID: avoid_this --> and that <!-- ID: 666 -->';</script>
blahblah<!-- ID: 123456 -->blahblah
EOD;
$query = '//comment()[starts-with(., " ID: ")]';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xp = new DOMXPath($dom);
$nodeList = $xp->query($query);
foreach ($nodeList as $node) {
echo substr($node->textContent, 5, -1);
}
Feel free to check the result after with is_numeric
or a regex. You can register your own php function and include it in the xpath query too: http://php.net/manual/en/domxpath.registerphpfunctions.php