Given the following code :
<body>
<img src="source.jpg" />
<p>
<img src="source.jpg" id ="hello" alt="nothing" />
<img src="source.jpg" id ="world"/>
</p>
</body>
What's the best way - using a regular expression (or better?) - to replace it so it becomes this:
<body>
<img src="source.jpg" id="img_0" />
<p>
<img src="source.jpg" id ="img_1" alt="nothing" />
<img src="source.jpg" id ="img_2"/>
</p>
</body>
In other words :
All the <image />
tags all gets populated by an id
attribute.
The id
attribute should contain an incremented attribute (this is not really the problem though as its just part of the replace procedure)
I guess two passes are needed, one to remove all the existent id
attributes and another to populate with new ones ?
I think the best approach is to use preg_replace_callback
.
Also I would recommend a slightly more stringent regexp
than those suggested so far - what if your page contains an <img />
tag that does not contain an id
attribute?
$page = '
<body>
<img src="source.jpg" />
<p>
<img src="source.jpg" id ="hello" alt="nothing" />
<img src="source.jpg" id ="world"/>
</p>
</body>';
function my_callback($matches)
{
static $i = 0;
return $matches[1]."img_".$i++;
}
print preg_replace_callback('/(<img[^>]*id\s*=\s*")([^"]*)/', "my_callback", $page);
Which produces the following for me:
<body>
<img src="source.jpg" />
<p>
<img src="source.jpg" id ="img_0" alt="nothing" />
<img src="source.jpg" id ="img_1"/>
</p>
</body>
The regexp
has two capturing groups, the first we preserve, the second we replace. I've used lots of negative character classes (e.g. [^>]*
= up to closing >
) to make sure that <img />
tags arn't required to have id
attributes.
With appropriate escaping (that I can never remember without trial and error), and something to increment the img_number, you want to replace something like this:
(<img .*?)(?:id=".*")?(.*?/>)
with something like this this:
\1 id="img_$i"\2
<?php
$data = <<<DATA
<body>
<img src="source.jpg" />
<p>
<img src="source.jpg" id ="hello" alt="nothing" />
<img src="source.jpg" id ="world"/>
</p>
</body>
DATA;
$doc = new DOMDocument('1.0', 'UTF-8');
$doc->strictErrorChecking = true;
$doc->standalone = true;
$doc->xmlStandalone = true;
$doc->formatOutput = true;
$doc->loadXML($data, LIBXML_NOWARNING | LIBXML_NOERROR);
$sNode = $doc->getElementsByTagName("img");
$id = 0;
foreach($sNode as $searchNode)
{
$searchNode->setAttribute('id', "img_$id");
$doc->importNode($searchNode);
$id++;
}
$result = $doc->saveHTML();
echo $result;