正则表达式用于修剪HTML标记中包含的字符串的空格

I've this HTML string (validated):

<div><img src="images/stories/2014/AAA.gif" alt="AAA" width="24" height="24" /> THE PRODUCTION OF: PLASTIC BOTTLES   <br /></div>

I've to extract the only title near <img> tag trimming all spaces before and after, than wrap it in a <h1> tag. The expeded result should be:

<div><h1>THE PRODUCTION OF: PLASTIC BOTTLES</h1></div>

I've done a regular expression that works but that also include the spaces in the final result:

/<img\s*src="[^"]+"\s*alt="AAA"\s*width="24"\s*height="24"\s*\/>\s*([^<]+)\s*<br\s*\/>/

The image is recognizable for these characteristics values of alt, width and height attributes. Thanks.

Making your match non greedy should do the trick: <img\s*src="[^"]+"\s*alt="AAA"\s*width="24"\s*height="24"\s*\/>\s*([^<]+?)\s*<br\s*\/> (notice the extra ? next to [^<]+). More information available here.

That being said, you should really be using something like the PHP DOM Parser to process HTML.

Actually, there's a simple enough way to do this without regex at all.

'<div><h1>' . trim(strip_tags($original_html)) . '</h1></div>';

First remove all tags, then trim the whitespace, finally wrap it in whatever tags you need.

I think a better solution is to use jQuery.Specifically the method .text()

<div id='mydiv'><img src="images/stories/2014/AAA.gif" alt="AAA" width="24" height="24" /> THE PRODUCTION OF: PLASTIC BOTTLES   <br /></div>`
 <script>var text = $('#mydiv').text();$('#mydiv').html('<h1>' + text + '</h1>');</script>

And the result is:

 <div><h1>THE PRODUCTION OF: PLASTIC BOTTLES</h1></div>