I have trouble to load XML document into DOM preserving empty tags and null-size strings. Here the example:
$doc = new DOMDocument("1.0", "utf-8");
$root = $doc->createElement("root");
$doc->appendChild($root);
$element = $doc->createElement("element");
$root->appendChild($element);
echo $doc->saveXML();
produces following XML:
<?xml version="1.0" encoding="utf-8"?>
<root><element/></root>
Empty element, exactly as expected. Now let's add empty text node into element.
$doc = new DOMDocument("1.0", "utf-8");
$root = $doc->createElement("root");
$doc->appendChild($root);
$element = $doc->createElement("element");
$element->appendChild($doc->createTextNode(""));
$root->appendChild($element);
echo $doc->saveXML();
produces following XML:
<?xml version="1.0" encoding="utf-8"?>
<root><element></element></root>
Non-empty element with null-size string. Good! But when I am trying to do:
$doc = new DOMDocument();
$doc->loadXML($xml);
echo $doc->saveXML($doc);
on these XML documents I always get
<?xml version="1.0" encoding="utf-8"?>
<root><element/></root>
ie null-size string is removed and just empty element is loaded. I believe it happens on loadXML(). Is there any way to convince DOMDocument loadXML() not to convert null-size string into empty element? It would be preferable if DOM would have TextNode with null-size string as element's child.
Solution is needed to be in PHP DOM due to the way what would happen to the loaded data further.
The problem to distinguish between those two is, that when DOMDocument loads the XML serialized document, it does only follow the specs.
By the book, in <element></element>
there is no empty text-node in that element - which is what others have commented already as well.
However DOMDocument is perfectly fine if you insert an empty text-node there your own. Then you can easily distinguish between a self-closing tag (no children) and an empty element (having one child, an empty text-node).
So how to enter those empty text-nodes? For example by using from the XMLReader based XMLReaderIterator library, specifically the DOMReadingIteration, which is able to build up the document, while offering each current XMLReader node for interaction:
$doc = new DOMDocument();
$iterator = new DOMReadingIteration($doc, $reader);
foreach ($iterator as $index => $value) {
// Preserve empty elements as non-self-closing by making them non-empty with a single text-node
// children that has zero-length text
if ($iterator->isEndElementOfEmptyElement()) {
$iterator->getLastNode()->appendChild(new DOMText(''));
}
}
echo $doc->saveXML();
That gives for your input:
<?xml version="1.0" encoding="utf-8"?>
<root><element></element></root>
This output:
<?xml version="1.0"?>
<root><element></element></root>
No strings attached. A fine build DOMDocument. The example is from examples/read-into-dom.php
and a fine proof that it is no problem when you load the document via XMLReader and you deal with that single special case you have.
Here is no difference for the loading XML parser. The DOM is exactly the same.
If you load/save a XML format that has a problem with empty tags, you can use an option to avoid the empty tags on save:
$dom = new DOMDocument();
$dom->appendChild($dom->createElement('foo'));
echo $dom->saveXml();
echo "
";
echo $dom->saveXml(NULL, LIBXML_NOEMPTYTAG);
Output:
<?xml version="1.0"?>
<foo/>
<?xml version="1.0"?>
<foo></foo>
You can trick XSLT processors to not use self-closing elements, by pretending a xsl:value-of
inserting a variable, but that variable being an empty string ''
.
Input:
<?xml version="1.0" encoding="utf-8"?>
<root>
<foo>
<bar some="value"></bar>
<self-closing attr="foobar" val="3.5"/>
</foo>
<goo>
<gle>
<nope/>
</gle>
</goo>
</root>
Stylesheet:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[not(node())]">
<xsl:copy>
<xsl:for-each select="@*">
<xsl:attribute name="{name()}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:for-each>
<xsl:value-of select="''"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Output:
<?xml version="1.0" encoding="utf-8"?>
<root>
<foo>
<bar some="value"></bar>
<self-closing attr="foobar" val="3.5"></self-closing>
</foo>
<goo>
<gle>
<nope></nope>
</gle>
</goo>
</root>
To solve this in PHP without the use of a XSLT processor, I can only think of adding empty text nodes to all elements with no children (like you do in the creation of the XML).