I am trying to get the root node of a PHP DOM Document. This is usually done by doing something like this:
$doc->documentElement;
However, trying this on a HTML string that contains a doctype:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml">...
and that is loaded into a DOM Document object like so:
$doc = new DOMDocument();
$doc->loadHTML($html);
returns the root node as the html
tag and not the doctype tag! I am guessing this because of the weird characters <!
- is there anyway to return the root node correctly?
Doctype isn't the root node, html
is. The doctype is simply the doctype declaration that tells the browser what the rest of the file is.
Maybe you can use DOMDocument::doctype ? ($doc -> doctype
)
the DOCTYPE
is not actually a node, and it certainly isn't the root node. Try $doc->doctype
.
I ran into this problem some time ago and it was because I actually didn't want the DOCTYPE
in there at all. I was using code snippets and was having a hard time getting the returned values to be untainted with DOCTYPE
and HTML tags added when there shouldn't be.
I am going to present an answer not in here yet just in case your having the same problem I had. My solution actually prevents the adding of any DOCTYPE
elements if you have a newer version of php. I believe it's a minimum of PHP v5.4 and up and also LibXML v2.7.8 minimum. If you have both of these versions up to date then its as simple as adding a constant flag to the method call of the DOMDocument object's loadHTML implementation. The constant is LIBXML_HTML_NODEFDTD
and it is used like this....
$doc = new DOMDocument();
$doc->loadHTML($someContentString, LIBXML_HTML_NODEFDTD);
This way there is no additional parsing needed at all and you can go about your life without this DOCTYPE
problem... unless you needed the DOCTYPE
tag in which case my answer and let someone else find it through Google :)