I am looking for suitable replacement code that allows me replace the content inside of any HTML tag that has a certain class e.g.
$class = "blah";
$content = "new content";
$html = '<div class="blah">hello world</div>';
// code to replace, $html now looks like:
// <div class="blah">new content</div>
Bare in mind that:
<h2 class="blah">
<div class="foo blah green">hello world</div>
I am thinking regular expressions should be able to do this, if not I am open to other suggestions such as using the DOM class (although I would rather avoid this if possible because it has to be PHP4 compatible).
Do not use regular expressions to parse HTML. You can use the built in DOMDocument, or something like simple_html_dom:
require_once("simple_html_dom.php");
$class = "blah";
$content = "new content";
$html = '<div class="blah">hello world</div>';
$doc = new simple_html_dom();
$doc->load($html);
foreach ( $doc->find("." . $class) as $node ) {
$node->innertext = $content;
}
Sorry, I didn't see the PHP4 requirement. Here's a solution using the standard DOMDocument as mentioned above.
function DOM_getElementByClassName($referenceNode, $className, $index=false) {
$className = strtolower($className);
$response = array();
foreach ( $referenceNode->getElementsByTagName("*") as $node ) {
$nodeClass = strtolower($node->getAttribute("class"));
if (
$nodeClass == $className ||
preg_match("/\b" . $className . "\b/", $nodeClass)
) {
$response[] = $node;
}
}
if ( $index !== false ) {
return isset($response[$index]) ? $response[$index] : false;
}
return $response;
}
$doc = new DOMDocument();
$doc->loadHTML($html);
foreach ( DOM_getElementByClassName($doc, $class) as $node ) {
$node->nodeValue = $content;
}
echo $doc->saveHTML();
There is no need to use the DOM class, this would probably be done quickest using jQuery, as Khnle said, or you could use the preg_replace() function. Give me some time, I may write a quick regex for you.
But I would recommend using something like jQuery, this way you can serve the page up to the user quickly and allow their computer to do the processing instead of your server.
If you are sure that $html is valid HTML code, you could use a HTML parser or even XML parser if it's valid XML code.
But the quick and dirty way in Regex would be something like:
$html = preg_replace('/(<[^>]+ class="[^>]*' . $class . '[^"]*"[^>]*>)[^<]+(<\/[^>]+>)/siU', '$1' . $content . '$2', $html);
Didn't test it too much, but it should work. Tell me if you find cases where it doesn't. ;)
Edit: Added "and dirty"... ;)
Edit 2: New version of the RegEx:
<?php
$class = "blah";
$content = "new content";
$html = '<div class="blah test"><h1><span>hello</span> world</h1></div><div class="other">other content</div><h2 class="blah">remove this</h2>';
$html = preg_replace('/<([\w]+)(\s[^>]*class="[^"]*' . $class . '[^"]*"[^>]*>).+(<\/\\1>)/siU', '<$1$2' . $content . '$3', $html);
echo $html;
?>
The last problem left is if theres a class that only has "blah" in its name, like "tooMuchBlahNow". Let's see how we can address that. Btw: Is it obvious already that I love playing with RegEx? ;)