I want to remove all the <br />
inside the table using PHP. I know I could use str_replace()
to remove <br />
. But it will remove all <br />
. I only want to remove <br />
between <table>
and </table>
. I have several tables in one string.
The html code is below. Also you can see this fiddle.
<p>Some text before table:</p><table cellpadding="0" cellspacing="0"><br /> <tbody><br /> <tr><br /> <td><br /> <p><strong>column1</strong></p> </td><br /> <td><br /> <p><strong>column2</strong></p> </td></tr><br /> <tr><br /> <td><br /> <p>1</p> </td><br /> <td><br /> <p>2</p> </td><br /> <br /> </tr><br /> </tbody><br /></table>
<p>Some text before table:</p><table cellpadding="0" cellspacing="0"><br /> <tbody><br /> <tr><br /> <td><br /> <p><strong>column1</strong></p> </td><br /> <td><br /> <p><strong>column2</strong></p> </td></tr><br /> <tr><br /> <td><br /> <p>1</p> </td><br /> <td><br /> <p>2</p> </td><br /> <br /> </tr><br /> </tbody><br /></table>
I tried the following way to do this, is this the best solution?
<?php
$input = '<p>Some text before table:</p><table cellpadding="0" cellspacing="0"><br /> <tbody><br /> <tr><br /> <td><br /> <p><strong>column1</strong></p> </td><br /> <td><br /> <p><strong>column2</strong></p> </td></tr><br /> <tr><br /> <td><br /> <p>1</p> </td><br /> <td><br /> <p>2</p> </td><br /> <br /> </tr><br /> </tbody><br /></table>
<p>Some text before table:</p><table cellpadding="0" cellspacing="0"><br /> <tbody><br /> <tr><br /> <td><br /> <p><strong>column1</strong></p> </td><br /> <td><br /> <p><strong>column2</strong></p> </td></tr><br /> <tr><br /> <td><br /> <p>1</p> </td><br /> <td><br /> <p>2</p> </td><br /> <br /> </tr><br /> </tbody><br /></table>';
$body = preg_replace_callback("~<table\b.*?/table>~si", "process_table", $input);
function process_table($match) {
return str_replace('<br />', '', $match[0]);
}
echo $body;
As stated here, "Regex is not a tool that can be used to correctly parse HTML". However, to give a solution that was asked for that works for this controlled case, I submit the following. It includes debug code which shows the before and after.
Note: I also tested with your regex and it works as well with /<table\b.*?<\/table>/si
in the preg_match()
<?php
$search ='<p>Some text before table:</p><table cellpadding="0" cellspacing="0"><br /> <tbody><br /> <tr><br /> <td><br /> <p><strong>column1</strong></p> </td><br /> <td><br /> <p><strong>column2</strong></p> </td></tr><br /> <tr><br /> <td><br /> <p>1</p> </td><br /> <td><br /> <p>2</p> </td><br /> <br /> </tr><br /> </tbody><br /></table>
<p>Some text before table:</p><table cellpadding="0" cellspacing="0"><br /> <tbody><br /> <tr><br /> <td><br /> <p><strong>column1</strong></p> </td><br /> <td><br /> <p><strong>column2</strong></p> </td></tr><br /> <tr><br /> <td><br /> <p>1</p> </td><br /> <td><br /> <p>2</p> </td><br /> <br /> </tr><br /> </tbody><br /></table>';
$search = replacebr($search);
function replacebr($search){
$offset=0;
$anew=array();
$asearch=array();
$notdone = 1;
$i=0;
echo $search;
while ($notdone == 1) {
($notdone = preg_match('/<table\s[^>]*>(.+?)<\/table>/', $search, $amatch, PREG_OFFSET_CAPTURE, $offset));
if (count($amatch)>0){
echo "amatch: " ; var_dump($amatch);
// add part before match
$anew[] = substr($search,$offset,$amatch[0][1]-$offset);
echo "anew (before): " ; var_dump($anew[count($anew)-1]);
// add match with replaced text
$anew[] = str_replace("<br />","",$amatch[0][0]);
echo "anew (match): " ; var_dump($anew[count($anew)-1]);
$offset += mb_strlen(substr($search,$offset,$amatch[0][1]-$offset))+ mb_strlen($amatch[0][0]);
echo "OFFSET: " ; var_dump($offset);
}
else{
// add last part of string - we better be done
$anew[] = substr($search, $offset);
$search=="";
if ($notdone == 1){
die("Error - should be done");
}
}
if ($i==100){
// prevent endless loop
die("Endless Loop");
}
$i++;
}
$new = implode("",$anew);
echo "*******************";
echo $new;
return $new;
}
?>
Dont recommend to parse html with regex, but if you have to
this might work.
Note - the test case is in perl but the regex will work in php.
Just globally replace with $1
# '~(?s)((?:(?!\A|<table\b)\G|<table\b)(?:(?!<br\s*/>|</table\b).)*)<br\s*/>(?=.*?</table\b)~'
(?s) # Dot-All
( # (1 start), Keep these
(?:
(?! \A | <table \b )
\G # Start match from end of last match
| # or,
<table \b # Start form '<table\b'
)
(?: # The chars before <br/ or </table end tags
(?!
<br \s* />
| </table \b
)
.
)*
) # (1 end)
<br \s* /> # Strip <br/>
(?= .*? </table \b ) # Must be </table end tag downstream
Perl test case
$/ = undef;
$str = <DATA>;
print "Before:
$str
";
$str =~ s~(?s)((?:(?!\A|<table\b)\G|<table\b)(?:(?!<br\s*/>|</table\b).)*)<br\s*/>(?=.*?</table\b)~$1~g;
print "After:
$str
";
__DATA__
<p>Some text before table:</p><table cellpadding="0" cellspacing="0"><br /> <tbody><br /> <tr><br /> <td><br /> <p><strong>column1</strong></p> </td><br /> <td><br /> <p><strong>column2</strong></p> </td></tr><br /> <tr><br /> <td><br /> <p>1</p> </td><br /> <td><br /> <p>2</p> </td><br /> <br /> </tr><br /> </tbody><br /></table>
Output >>
Before:
<p>Some text before table:</p><table cellpadding="0" cellspacing="0"><br /> <tbody><br /> <tr><br /> <td><br /> <p><strong>column1</strong></p> </td><br /> <td><br /> <p><strong>column2</strong></p> </td></tr><br /> <tr><br /> <td><br /> <p>1</p> </td><br /> <td><br /> <p>2</p> </td><br /> <br /> </tr><br /> </tbody><br /></table>
After:
<p>Some text before table:</p><table cellpadding="0" cellspacing="0"> <tbody> <tr> <td> <p><strong>column1</strong></p> </td> <td> <p><strong>column2</strong></p> </td></tr> <tr> <td> <p>1</p> </td> <td> <p>2</p> </td> </tr> </tbody></table>