preg_match_all用于具有区分大小写代码的表

I was trying to extract railway tickets data for internal use.

Total data looks like this table.

I have extracted every <td> content with preg_match_all condition but I cannot extract coach position as seen in this screenshot

I have tried code like below :

<?php
    $result='tables code over here which you can find in pastebin link';
    preg_match_all('/<TD class="table_border_both"><b>(.*)<\/b><\/TD>/s',$result,$matches);
    var_dump($matches);
?>

I get rubbish output like:

you can use the following regular Expression:

$re = "/<TD class=\"table_border_both\"><b>([0-9][0-9])
<\/b><\/TD>/"; 
$str = "<table width=\"100%\" border=\"0\" cellpadding=\"0\" cellspacing=\"1\" class=\"table_border\">

<tr>
<td colspan=\"9\" class=\"heading_table_top\">Journey Details</td>
</tr>
<TR class=\"heading_table\">
<td width=\"11%\">Train Number</Td>
<td width=\"16%\">Train Name</td>
<td width=\"18%\">Boarding Date <br>(DD-MM-YYYY)</td>
<td width=\"7%\">From</Td>
<td width=\"7%\">To</Td>
<td width=\"14%\">Reserved Upto</Td>
<td width=\"21%\">Boarding Point</Td>
<td width=\"6%\">Class</Td>
</TR>
<TR>
<TD class=\"table_border_both\">*12559</TD>
<TD class=\"table_border_both\">SHIV GANGA EXP </TD>
<TD class=\"table_border_both\"> 5- 7-2014</TD>
<TD class=\"table_border_both\">BSB </TD>
<TD class=\"table_border_both\">NDLS</TD>
<TD class=\"table_border_both\">NDLS</TD>
<TD class=\"table_border_both\">BSB </TD>
<TD class=\"table_border_both\"> SL</TD>
</TR>
</table>
<TABLE width=\"100%\" border=\"0\" cellpadding=\"0\" cellspacing=\"1\" class=\"table_border\" id=\"center_table\" >

<TR>
<td width=\"25%\" class=\"heading_table_top\">S. No.</td>
<td width=\"45%\" class=\"heading_table_top\">Booking Status <br /> (Coach No , Berth No., Quota)</td>
<td width=\"30%\" class=\"heading_table_top\">* Current Status <br />(Coach No , Berth No.)</td>
<td width=\"30%\" class=\"heading_table_top\">Coach Position</td>
</TR>
<TR>
<TD class=\"table_border_both\"><B>Passenger 1</B></TD>
<TD class=\"table_border_both\"><B>S1  , 33,CK    </B></TD>
<TD class=\"table_border_both\"><B>S1  , 33</B></TD>
<TD class=\"table_border_both\"><b>11
</b></TD>
</TR>
<TR>
<TD class=\"table_border_both\"><B>Passenger 2</B></TD>
<TD class=\"table_border_both\"><B>S1  , 34,CK    </B></TD>
<TD class=\"table_border_both\"><B>S1  , 34</B></TD>
<TD class=\"table_border_both\"><b>11
</b></TD>
</TR>
<TR>
<TD class=\"table_border_both\"><B>Passenger 3</B></TD>
<TD class=\"table_border_both\"><B>S1  , 36,CK    </B></TD>
<TD class=\"table_border_both\"><B>S1  , 36</B></TD>
<TD class=\"table_border_both\"><b>11
</b></TD>
</TR>
<TR>
<TD class=\"table_border_both\"><B>Passenger 4</B></TD>
<TD class=\"table_border_both\"><B>S1  , 37,CK    </B></TD>
<TD class=\"table_border_both\"><B>S1  , 37</B></TD>
<TD class=\"table_border_both\"><b>11
</b></TD>
</TR>
<TR>
<td class=\"heading_table_top\">Charting Status</td>
<TD colspan=\"3\" align=\"middle\" valign=\"middle\" class=\"table_border_both\">   CHART PREPARED   </TD>
</TR>
<TR>
<td colspan=\"4\"><font color=\"#1219e8\" size=\"1\"><b> * Please Note that in case the Final Charts have not been prepared, the Current Status might upgrade/downgrade at a later stage.</font></b></Td>
</TR>
</table>"; 

preg_match_all($re, $str, $matches);

Most useful website for regex: http://regex101.com/

$regexp = '/<td class="table_border_both"><b>(.*)\s*<\/b><\/td>/gi';

You have line break in "Coach position" <td> and you forgot to mention it in regexp. And it is better to use \s* so if you have there spaces or line brakes it won't fail.


You know that you have 4 columns, thus the result from regexp will have further transformations:

$data = array_chunk($matches, 4); // split up the matches by rows

And you have already ready rows ... few more lines and you have more than you need:

$data = array_map(function (array $row) {
    return array_combine(['snum', 'status_book', 'status_cur', 'position'], $row);
}, $data); // assign each column in the row it's name

If we combine all the code, it will probably look like this:

$data = array_map(function (array $row) {
    return array_combine(['snum', 'status_book', 'status_cur', 'position'], $row);
}, array_chunk($matches, 4));

Usage of \s+ is needed because there are some spaces in rows, otherwise it won't be matched

$data = file_get_contents("http://pastebin.com/raw.php?i=zJrvq95H");
preg_match_all("#<b>([0-9]{0,})\s+<\/b>#", $data, $matches);
print_r($matches[1]);

Result:

Array
(
    [0] => 11
    [1] => 11
    [2] => 11
    [3] => 11
)