简单的HTML DOM解析不起作用

I'm trying to extract emails, names and phone numbers from my html table and use these details in order to send an automatic email reply.

For some reason, I get a fatal error saying: Call to undefined function file_get_html() in http://itecdigital.org.uk/2015/430926/BeautyFactoryBooking/admin.php on line 3

My code for the html Dom parser:

<?php

$html = file_get_html('http://itecdigital.org.uk/2015/430926/BeautyFactoryBooking/admin.php');

$dom = new DOMDocument();
$dom->loadHTML($html);

$elements = $dom->getElementsByTagName('tr');
//Loop through each row
foreach ($rows as $row) {
    //Loop through each child (cell) of the row
    foreach ($row->children() as $cell) {
        echo $cell->plaintext; // Display the contents of each cell - this is the value you want to extract
    }
}

?>

Can anyone see whats wrong with this?

My html code for the table is as follows:

<?php

        echo "<table style='border: solid 1px black;'>";
        echo "<tr><th>Id</th><th>First Name</th><th>Last Name</th><th>Email Address</th><th>Phone Num</th><th>Treatment</th><th>Date</th><th>Time</th><th>Message</th><th>Reply</th></tr>";

        class TableRows extends RecursiveIteratorIterator {
            function __construct($it) {
                parent::__construct($it, self::LEAVES_ONLY);
            }

            function current() {
                return "<td style='width:100px;border:1px solid black;'>" . parent::current(). "</td>";
            }

            function beginChildren() {
                echo "<tr>";
            }

            function endChildren() {
                echo "</tr>" . "
";
            }
        }

        $servername = "#";
        $username = "#";
        $password = "#";
        $dbname = "#";

        try {
            $conn = new PDO("mysql: host=$servername; dbname=$dbname", $username, $password);
            $conn->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
            $stmt = $conn->prepare("SELECT Booking_request_form.id_booking, Client_Information.first_name, Client_Information.last_name, Client_Information.email_address, Client_Information.phone_number, Booking_request_form.treatment, Booking_request_form.date, Booking_request_form.time, Booking_request_form.message FROM Booking_request_form INNER JOIN Client_Information WHERE Client_Information.id_client=Booking_request_form.client_fk"); 

            $stmt->execute();

            // set the resulting array to associative
            $result = $stmt->setFetchMode(PDO::FETCH_ASSOC);
            foreach(new TableRows(new RecursiveArrayIterator($stmt->fetchAll())) as $k=>$v) {
                echo $v;
            }
        }

        catch(PDOException $e) {
            echo "Error: " . $e->getMessage();
        }

        $conn = null;
        echo "</table>";

?> 

Is there an easy fix to this?

You mix Simple HTML Dom third part class commands (as per your question title) with DOMDocument built-in class commands, so your code can't work.

file_get_html() is a Simple HTML Dom function, replace it with file_get_contents():

$html = file_get_contents( '/Users/sam/Downloads/trash.html' );

$dom = new DOMDocument();
libxml_use_internal_errors( 1 );      // <-- add this line to avoid DOM errors
$dom->loadHTML( $html );

$elements = $dom->getElementsByTagName('tr');

Now, init an array ($rows) to fill with cells values and a integer string ($cols) for column numbers; your HTML is malformed and this variable will help you to produce a well-formed table:

$rows = array();
$cols = 0;

In your code there is another error: you put <tr> in $elements, then you refer it in foreach() using $rows. Then, you call ->children() method to iterate through all children, but DOMElement don't have this method, use ->childNodes property instead. But, first to all, check the row column number and update previously declared variable $cols. Inside nested foreach(), you add cells values to $rows. You will display later. To retrieve values of DOMNode use ->nodeValue instead of ->plaintext. I have wrapped $cell->nodeValue by trim() to remove extra spaces at begin/end of string:

foreach ($elements as $key => $row)
{
    if( $row->childNodes->length > $cols ) $cols = $row->childNodes->length;
    foreach( $row->childNodes as $cell )
    {
        $rows[$key][] = trim( $cell->nodeValue );
    }
}

Now, you have the cells values in multidimensional array $rows.


Table display

Your code for displaying table is not your code, it is a copy-and-paste from the net: it has nothing to do with your question and you can ignore it.

Use a simply code like this instead:

echo "<table>
";
echo "    <tr>
";
for( $j = 0; $j < $cols; $j++ ) echo "        <th>{$rows[0][$j]}</th>
";
echo "    </tr>
";
for( $i = 1; $i < count($rows); $i++ )
{
    echo "    <tr>
";
    for( $j = 0; $j < $cols; $j++ )
    {
        if( isset( $rows[$i][$j] ) ) echo "        <td>{$rows[$i][$j]}</td>
";
        else                         echo "        <td></td>
";
    }
    echo "    </tr>
";
}
echo "</table>
";

This is only a working example, modify HTML code as you prefer. You can also change the order of cells. Note the different code between printing table header and printing table rows (for() loop start from 1). Also note the use of $cols: if a cell is empty, we output an empty <td>.

Use file_get_contents function instead of file_get_html. There is no such function file_get_html in PHP.

However, there are few errors in the HTML:

  1. Unclosed tag <div class="headertext">. I suppose it should have close tag right after <a href="log_out.php">Logout</a>;
  2. Entities like & should be encoded as &amp;;
  3. It could be considered as a bug but PHP doesn't recognise header tag and throw a warning. However, it could still load the HTML page successfully.
  4. Last but not least, there are a number of mistakes of using DOMElement attributes.

I have rewritten your code to show you how it could work:

<?php

$html = file_get_contents('http://itecdigital.org.uk/2015/430926/BeautyFactoryBooking/admin.php')

$dom = new DOMDocument();
$result = $dom->loadHTML($html, LIBXML_NOERROR);
var_dump($result);
$elements = $dom->getElementsByTagName('tr');
//Loop through each row
var_dump($elements);
foreach ($elements as $row) {
    //Loop through each child (cell) of the row
    foreach ($row->childNodes as $cell) {
        echo $cell->nodeValue; // Display the contents of each cell - this is the value you want to extract
    }
}


?>

and the HTML should look like this:

<!DOCTYPE html>
<html>
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
      <meta http-equiv="X-UA-Compatible" content="chrome=1,IE=edge" />
      <title>Beauty Factory Bookings</title>
      <link href='http://fonts.googleapis.com/css?family=Montserrat:400,700' rel='stylesheet' type='text/css'>
   </head>
   <body>
      <img action="login_success.php" src="http://i.imgur.com/wbhPNAs.png" style="width: 240px; height:35px;"> 
      <header>
         <div class="headertext"> <a href="booking.php">Book Appointment</a> <a href="about.php">About Us</a> <a href="contact.php">Contact Us</a> <a href="log_out.php">Logout</a></div>
      </header>
      <table style='border: solid 1px black;'>
         <tr>
            <th>Id</th>
            <th>First Name</th>
            <th>Last Name</th>
            <th>Email Address</th>
            <th>Phone Num</th>
            <th>Treatment</th>
            <th>Date</th>
            <th>Time</th>
            <th>Message</th>
            <th>Reply</th>
         </tr>
         <tr>
            <td style='width:100px;border:1px solid black;'>1</td>
            <td style='width:100px;border:1px solid black;'>Filip</td>
            <td style='width:100px;border:1px solid black;'>Grebowski</td>
            <td style='width:100px;border:1px solid black;'>grebowskifilip@gmail.com</td>
            <td style='width:100px;border:1px solid black;'>07449474894</td>
            <td style='width:100px;border:1px solid black;'>Waxing - Full Leg &amp; Bikini</td>
            <td style='width:100px;border:1px solid black;'>11/03/2016</td>
            <td style='width:100px;border:1px solid black;'>10:20</td>
            <td style='width:100px;border:1px solid black;'>Is this okay?</td>
         </tr>
         <tr>
            <td style='width:100px;border:1px solid black;'>2</td>
            <td style='width:100px;border:1px solid black;'>Filip</td>
            <td style='width:100px;border:1px solid black;'>Grebowski</td>
            <td style='width:100px;border:1px solid black;'>grebowskifilip@gmail.com</td>
            <td style='width:100px;border:1px solid black;'>07449474894</td>
            <td style='width:100px;border:1px solid black;'>Anti-Age Facial</td>
            <td style='width:100px;border:1px solid black;'>01/01/1970</td>
            <td style='width:100px;border:1px solid black;'>10:20</td>
            <td style='width:100px;border:1px solid black;'>Is this ok????</td>
         </tr>
      </table>
   </body>
   <style> table { margin-top: 60px; border-collapse: collapse; margin-left: auto; margin-right: auto; margin-bottom: 60px; } tr:nth-child(even) { background-color: #f2f2f2 } th, td { padding: 15px; } img { padding-top: 12px; padding-left: 12px; } .headertext { float: right; padding-top: 20px; padding-right: 3%; } body { background: url('#') no-repeat fixed center center; background-size: cover; font-family: 'Montserrat', sans-serif; margin: 0; padding: 0; } header { background: black; -ms-filter: "progid:DXImageTransform.Microsoft.Alpha(Opacity=50)"; filter: alpha(opacity=80); -moz-opacity: 0.8; -khtml-opacity: 0.8; opacity: 0.7; height: 60px; font-family: 'Montserrat', sans-serif; } a:link { font-size: 15px; margin-left: 75px; color: white; background-color: transparent; text-decoration: none; } a:visited { font-size: 15px; margin-left: 75px; color: white; background-color: transparent; text-decoration: none; } a:hover { font-size: 15px; margin-left: 75px; color: #C0C0C0; background-color: transparent; text-decoration: none; } </style>
</html>

Your HTML should have a proper HTML structure, not just the table:

<!DOCTYPE html>
<html>
<body>
    <?php
        echo "<table style='border: solid 1px black;'>";
        /* etc */
    ?>
</body>
</html>

Also, make sure to correctly close tags in the PHP output.


*EDIT*

I just researched Simple HTML DOM.

Make sure to include the library file in your code: include("/path/to/simple_html_dom.php");

Furthermore, for the Simple HTML DOM, you don't need to load $html to a DOMDocument. Simply say

$html = file_get_html('http://itecdigital.org.uk/2015/430926/BeautyFactoryBooking/admin.php');

$elements = $html->find('tr');

Please read the PHP Simple HTML DOM Parser Manual for more information.