I've this HTML string (validated):
<div><img src="images/stories/2014/AAA.gif" alt="AAA" width="24" height="24" /> THE PRODUCTION OF: PLASTIC BOTTLES <br /></div>
I've to extract the only title near <img>
tag trimming all spaces before and after, than wrap it in a <h1>
tag. The expeded result should be:
<div><h1>THE PRODUCTION OF: PLASTIC BOTTLES</h1></div>
I've done a regular expression that works but that also include the spaces in the final result:
/<img\s*src="[^"]+"\s*alt="AAA"\s*width="24"\s*height="24"\s*\/>\s*([^<]+)\s*<br\s*\/>/
The image is recognizable for these characteristics values of alt, width and height attributes. Thanks.
Making your match non greedy should do the trick: <img\s*src="[^"]+"\s*alt="AAA"\s*width="24"\s*height="24"\s*\/>\s*([^<]+?)\s*<br\s*\/>
(notice the extra ?
next to [^<]+
). More information available here.
That being said, you should really be using something like the PHP DOM Parser to process HTML.
Actually, there's a simple enough way to do this without regex at all.
'<div><h1>' . trim(strip_tags($original_html)) . '</h1></div>';
First remove all tags, then trim the whitespace, finally wrap it in whatever tags you need.
I think a better solution is to use jQuery.Specifically the method .text()
<div id='mydiv'><img src="images/stories/2014/AAA.gif" alt="AAA" width="24" height="24" /> THE PRODUCTION OF: PLASTIC BOTTLES <br /></div>`
<script>var text = $('#mydiv').text();$('#mydiv').html('<h1>' + text + '</h1>');</script>
And the result is:
<div><h1>THE PRODUCTION OF: PLASTIC BOTTLES</h1></div>