Possible Duplicate:
How to parse HTML with PHP?
I need to parse a string inside a td tag. I can do this using jQuery with the following:
$("#right .olddata:first td.numeric:first").html()
If I have the HTML code in a string variable, how can I get the content of the same td?
You can use DOMDocument
and DOMXPath
.
Example (our HTML is in a string variable $html
):
$doc = new DOMDocument();
$doc->loadHTML($html);
$XPath = new DOMXPath($doc);
$tr = $XPath->query('//*[@id="right"]//*[@class="olddata"][1]//td[@class="numeric"][1]');
$tr = $tr->item(0);
$trHTML = $tr->nodeValue;
I think you're looking for the PHP DOM extension. Alternatively, you could just match what you need using regular expressions.
Simple HTML DOM
Simple HTML Dom provides an object-oriented way of accessing the html dom in php. I've used it before with alot of success, but it will choke on a large dom structure. A nice feature is the ability to manipulate the dom and save it using this oo-design. It allows you to perform selector-searches of the dom:
// Find all <div> which attribute id=foo
$ret = $html->find('div[id=foo]');
or:
// Find all <li> in <ul>
foreach($html->find('ul') as $ul)
{
foreach($ul->find('li') as $li)
{
// do something...
}
}
// Find first <li> in first <ul>
$e = $html->find('ul', 0)->find('li', 0);
And it allows for traversal:
echo $html->getElementById("div1")->childNodes(1)->childNodes(1)->childNodes(2)->getAttribute('id');
DOMDocument
As others have noted, you can also use the DOMDocument as well.
XPath
From my personal experience, while xpath is harder to get working, it's worth it if you're only interested in extracting info from the dom.
While not perfectly related to the info you're trying to extract, here's how I've used xpath to extract info from an xml document:
The XML:
<?xml version="1.0" encoding="utf-8"?>
<Report>
<CampaignPerformanceReportColumns>
<Column name="AccountName" />
...
<Column name="CampaignId" />
</CampaignPerformanceReportColumns>
<Table>
<Row>
<CampaignName value="Auctions" />
<GregorianDate value="8/11/2010" />
...
<CampaignId value="60312546" />
</Row>
<Row>
<CampaignName value="Auctions" />
<GregorianDate value="8/11/2010" />
...
<CampaignId value="60312546" />
</Row>
<Row>
<CampaignName value="Auctions 2" />
<GregorianDate value="8/11/2010" />
...
<CampaignId value="603125467" />
</Row>
</Table>
</Report>
PHP:
$xml = simplexml_load_file($file);
// Get each Row
$result = $xml->xpath("Table/Row");
// Get the CampaignId of each Row
$result = $xml->xpath("//Row/CampaignId");
XPath has many more features; I'd encourage you to explore it if you need to extract alot of info from any xml-structured document.
You should definitely take a peek at DOMDocument->loadHTML().
$doc = new DOMDocument();
$doc->loadHTML("<html><body><p id=\"foo\">bar</p></body></html>");
$foo = $doc->getElementById('foo');
echo $foo; // Outputs 'bar'
$td = $doc->getElementsByTagName('td')->nodeValue;
echo $td; // Outputs your <td> value. In this case, nothing.