I would like to turn uppercase h1, h2,... tags into capitalized text with PHP. I'm close, but not there yet. The below snippet does not turn the first character of "LOREM" into uppercase (probably because it tries to uppercase '<'). It would be easy to modify the callback PHP function, but I wish I could do this by only modifying the regex piece:
$var = "
<h1>LOREM IPSUM DOLORES AMET</h1>
THIS IS SOME TEXT
<H2>LOREM IPSUM DOLORES AMET</H2>";
$line = preg_replace_callback(
'/<h[1-9]>(.*)\>/i',
function ($matches) {
return ucfirst(strtolower($matches[0]));
},
$var
);
print($line);
Results in:
<h1>lorem ipsum dolores amet</h1>
THIS IS SOME TEXT
<H2>lorem ipsum dolores amet</H2>
Desired output:
<h1>Lorem ipsum dolores amet</h1>
THIS IS SOME TEXT
<H2>Lorem ipsum dolores amet</H2>
Use a DOMDocument
<?php
$var = "
<h1>LOREM IPSUM DOLORES AMET</h1>
THIS IS SOME TEXT
<H2>LOREM IPSUM DOLORES AMET</H2>";
$dom = new DOMDocument();
$dom->loadHTML($var);
$tags = array("h1", "h2");
//loop thru all h1 and h2 tags
foreach ($tags as $tag) {
//get all elements of the current tag
$elements = $dom->getElementsByTagName($tag);
//if we found at least 1 element
if (!empty($elements)) {
//loop thru each element of the given tag
foreach ($elements as $element) {
//run ucfirst on the nodevalue
//which is equivalent to the "textContent" property of a DOM node
$element->nodeValue = ucfirst(strtolower($element->nodeValue));
}
}
}
$html = $dom->saveHTML();
//remove extra markup
$html = str_replace("</body></html>","",substr($html,strpos($html,"<h1>"));
echo $html;
<h1>Lorem ipsum dolores amet</h1>
THIS IS SOME TEXT
<h2>Lorem ipsum dolores amet</h2>
It's not $matches[0]
, it's $matches[1]
. matches[0]
refers to the entire match (ie, ucfirst
, strtolower
functions applies to the whole match) whereas $matches[1]
refers to the characters which are present inside the group index 1. Because we included <h[1-9]>
in the regex, it matches the starting <h>
tags. But in the replacement part, we included only the group index 1 like ucfirst(strtolower($matches[1]))
. So the starting <h>
tags got removed. See the below example.
$var = "
<h1>LOREM IPSUM DOLORES AMET</h1>
THIS IS SOME TEXT
<H2>LOREM IPSUM DOLORES AMET</H2>";
$line = preg_replace_callback(
'/<h[1-9]>(.*)\>/i',
function ($matches) {
return ucfirst(strtolower($matches[1]));
},
$var
);
print($line);
Output:
Lorem ipsum dolores amet</h1
THIS IS SOME TEXT
Lorem ipsum dolores amet</h2
But the above replaces the <h1>
tags at the first too. So i recommend you the below which applies strtolower
, ucfirst
functions only to the part within the <h>
tags.
$var = "
<h1>LOREM IPSUM DOLORES AMET</h1>
THIS IS SOME TEXT
<H2>LOREM IPSUM DOLORES AMET</H2>";
$line = preg_replace_callback(
'/<h[1-9]>\K.*?(?=<)/i',
function ($matches) {
return ucfirst(strtolower($matches[0]));
},
$var
);
print($line);
Output:
<h1>Lorem ipsum dolores amet</h1>
THIS IS SOME TEXT
<H2>Lorem ipsum dolores amet</H2>
\K
discards the previously matched characters from printing at the final. .*?
would do a non-greedy match of any character zero or more times (?=<)
upto a literal <
symbol.
Your are returning the entire match using $matches[0]
. Use lookarounds in this case.
I would recommend using a capturing group within the first <h...>
tag so you can use it as a backreference; therefore you will match the same ending tag matched from that group.
$text = preg_replace_callback('~<h([1-9])>\K[^<]++(?=</h\1>)~i',
function($m) {
return ucfirst(strtolower($m[0]));
}, $text);
Although you can do this using regex, I recommend utilizing DOM
for this.
$doc = DOMDocument::loadHTML('
<h1>LOREM IPSUM DOLORES AMET</h1>
THIS IS SOME TEXT
<H2>LOREM IPSUM DOLORES AMET</H2>
');
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//h1|//h2|//h3|//h4|//h5|//h6');
foreach ($nodes as $node) {
$node->nodeValue = ucfirst(strtolower($node->nodeValue));
}
echo $doc->saveHTML();
No regex needed. Obligatory link. Don't use regex to parse HTML. Ever.
<?php
$HTMLString = <<<HTML
<h1>lorem ipsum dolores amet</h1>
THIS IS SOME TEXT
<h2>lorem ipsum dolores amet</h2>
HTML;
$doc = new DOMDocument();
$doc->loadHTML($HTMLString);
//You can also use xpath. Loop results after using this instead:
//$xpath = new DOMXPath($doc);
//$nodeList= $xpath->query(//h2);
$nodeList = $doc->getElementsByTagName('h2');
foreach ($nodeList as $node) {
$stringArray = explode(' ', $node->nodeValue);
$stringArray[0] = ucfirst($stringArray[0]);
$capitalizedSentence = implode(' ', $stringArray);
echo $capitalizedSentence;
}
From: