从DOM Document类获取根节点

I am trying to get the root node of a PHP DOM Document. This is usually done by doing something like this:

$doc->documentElement;

However, trying this on a HTML string that contains a doctype:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml">...

and that is loaded into a DOM Document object like so:

$doc = new DOMDocument();
$doc->loadHTML($html);

returns the root node as the html tag and not the doctype tag! I am guessing this because of the weird characters <!- is there anyway to return the root node correctly?

Doctype isn't the root node, html is. The doctype is simply the doctype declaration that tells the browser what the rest of the file is.

Maybe you can use DOMDocument::doctype ? ($doc -> doctype)

the DOCTYPE is not actually a node, and it certainly isn't the root node. Try $doc->doctype.

I ran into this problem some time ago and it was because I actually didn't want the DOCTYPE in there at all. I was using code snippets and was having a hard time getting the returned values to be untainted with DOCTYPE and HTML tags added when there shouldn't be.

I am going to present an answer not in here yet just in case your having the same problem I had. My solution actually prevents the adding of any DOCTYPE elements if you have a newer version of php. I believe it's a minimum of PHP v5.4 and up and also LibXML v2.7.8 minimum. If you have both of these versions up to date then its as simple as adding a constant flag to the method call of the DOMDocument object's loadHTML implementation. The constant is LIBXML_HTML_NODEFDTD and it is used like this....

$doc = new DOMDocument();
$doc->loadHTML($someContentString, LIBXML_HTML_NODEFDTD);

This way there is no additional parsing needed at all and you can go about your life without this DOCTYPE problem... unless you needed the DOCTYPE tag in which case my answer and let someone else find it through Google :)