Looking for a regexp sequence of matches and replaces (preferably PHP but doesn't matter) to change this (the start and end is just random text that needs to be preserved).
IN:
fkdshfks khh fdsfsk
<!--g1-->
<div class='codetop'>CODE: AutoIt</div>
<div class='geshimain'>
<!--eg1-->
<div class="autoit" style="font-family:monospace;">
<span class="kw3">msgbox</span>
</div>
<!--gc2-->
<!--bXNnYm94-->
<!--egc2-->
<!--g2-->
</div>
<!--eg2-->
fdsfdskh
to this OUT:
fkdshfks khh fdsfsk
<div class='codetop'>CODE: AutoIt</div>
<div class='geshimain'>
<div class="autoit" style="font-family:monospace;">
<span class="kw3">msgbox</span>
</div>
</div>
fdsfdskh
Thanks.
Are you just trying to remove the comments? How about
s/<!--[^>]*-->//g
or the slightly better (suggested by the questioner himself):
<!--(.*?)-->
But remember, HTML is not regular, so using regular expressions to parse it will lead you into a world of hurt when somebody throws bizarre edge cases at it.
Ah I've done it,
<!--(.*?)-->
preg_replace('/<!--(.*)-->/Uis', '', $html)
This PHP code will remove all html comment tags from the $html string.
Try the following if your comments contain line breaks:
/<!--(.|
)*?-->/g
Do not forget to consider conditional comments, as
<!--(.*?)-->
will remove them. Try this instead:
<!--[^\[](.*?)-->
This will also remove downlevel-revealed conditional comments, though.
EDIT:
This won't remove downlevel-revealed or downlevel-hidden comments.
<!--(?!<!)[^\[>].*?-->
these code is also remove javascript code. that's too bad :|
here's the example javascript code will be remove with this code:
<script type="text/javascript"><!--
var xxx = 'a';
//-->
</script>
<!--([\s\S]*?)-->
Works in javascript and VBScript also as "." doesn't match line breaks in all languages
function remove_html_comments($html) {
$expr = '/<!--[\s\S]*?-->/';
$func = 'rhc';
$html = preg_replace_callback($expr, $func, $html);
return $html;
}
function rhc($search) {
list($l) = $search;
if (mb_eregi("\[if",$l) || mb_eregi("\[endif",$l) ) {
return $l;
}
}
A better version would be:
(?=<!--)([\s\S]*?)-->
It matches html comments like these:
<!--
multi line html comment
-->
or
<!-- single line html comment -->
and what is most important it matches comments like this (the other regex shown by others do not cover this situation):
<!-- this is my blog: <mynixworld.inf> -->
Note
Although syntactically the one below is a html comment your browser might parse it somehow differently and thus it might have a special meaning. Stripping such strings might break your code.
<!--[if !(IE 8) ]><!-->
Here is my attempt:
<!--(?!<!)[^\[>][\s\S]*?-->
This will also remove multi line comments and won't remove downlevel-revealed or downlevel-hidden comments.
// Remove multiline comment
$mlcomment = '/\/\*(?!-)[\x00-\xff]*?\*\//';
$code = preg_replace ($mlcomment, "", $code);
// Remove single line comment
$slcomment = '/[^:]\/\/.*/';
$code = preg_replace ($slcomment, "", $code);
// Remove extra spaces
$extra_space = '/\s+/';
$code = preg_replace ($extra_space, " ", $code);
// Remove spaces that can be removed
$removable_space = '/\s?([\{\};\=\(\)\\\/\+\*-])\s?/';
$code = preg_replace ('/\s?([\{\};\=\(\)\/\+\*-])\s?/', "\\1", $code);