I'm using PHP version 5.6 and I can't figure out why the regular expression won't match the second row correctly.
$str = '<tr><td class="DH">Sale Date</td></tr><tr><td class="DD">10-MAR-15</td></tr><tr><td class="DD">18-APR-17</td></tr>';
preg_match_all('/<tr>.*?class="D.*?<\/tr>/', $str, $matches);
print_r($matches);
preg_match_all('/<tr>.*?class="DH.*?<\/tr>/', $str, $matches);
print_r($matches);
preg_match_all('/<tr>.*?class="DD.*?<\/tr>/', $str, $matches);
print_r($matches);
This code outputs:
Array
(
[0] => Array
(
[0] => <tr><td class="DH">Sale Date</td></tr>
[1] => <tr><td class="DD">10-MAR-15</td></tr>
[2] => <tr><td class="DD">18-APR-17</td></tr>
)
)
Array
(
[0] => Array
(
[0] => <tr><td class="DH">Sale Date</td></tr>
)
)
Array
(
[0] => Array
(
[0] => <tr><td class="DH">Sale Date</td></tr><tr><td class="DD">10-MAR-15</td></tr>
[1] => <tr><td class="DD">18-APR-17</td></tr>
)
)
The regex essentially means match all shortest sequences between <tr>
and </tr>
that contain class="D
.
Notice how the first regex matches all 3 rows individually correctly.
The second one does the same but wants the row to contain class="DH
which it does correctly.
The third regex is supposed to match the other rows which contain class="DD
. For some reason only the first result (corresponding to the second table row) wants to include the previous row.
Even if I add a space between </tr>
and <tr>
as in </tr> <tr>
I'm getting the same result. However, if I insert a line break things work.
Can anyone explain what's going on and how to fix my code?
/<tr>.*?class="DD.*?/
says "find <tr>
, then match everything until you find class="DD"
. So it sees:
<tr><td class="DH">Sale Date</td></tr><tr><td class="DD">
and matches the first <tr>
, then the .*
matches <td class="DH">Sale Date</td></tr><tr><td
, then it sees class="DH"
which matches the next part.
When you add a line break, .*
stops matching, so it makes it work.