I am adding content to an XML document using PHP File_Put_Contents
and then I am using Microsoft Word to open that document. The problem is, if I add the Euro currency symbol(€
), then the document breaks, I get the following error:
€
is not a valid XML entity.
Trying to solve encoding issues with entities is a bad practice. Instead, make sure all your strings are properly UTF-8.
Have you tried to used '€'? And make sure you clean up your string using the snipped below:
$currentString = preg_replace("[^!-~ ]", '', $currentString);
First make sure that your strings are UTF-8 actually. The methods and functions in PHP will expect it as UTF-8 independent from the output. It is possible to work with other character sets/encodings but this is really complex.
If you create the XML using an XML API like DOM or XMLWriter, it will take care of the encoding as needed. In an UTF-8 XML document the €
does not need to be encoded.
$document = new DOMDocument('1.0', 'UTF-8');
$document
->appendChild($document->createElement('price'))
->appendChild($document->createTextNode('€ 42.00'));
echo $document->saveXml();
Output:
<?xml version="1.0" encoding="UTF-8"?>
<price>€ 42.00</price>
However in an ASCII XML document the special character needs to be encoded as a numeric entity. Named entities like €
will not work. They are specific to (X)HTML and not XML.
$document = new DOMDocument('1.0', 'ASCII');
$document
->appendChild($document->createElement('price'))
->appendChild($document->createTextNode('€ 42.00'));
echo $document->saveXml();
Output:
<?xml version="1.0" encoding="ASCII"?>
<price>€ 42.00</price>
The same is possible with XMLWriter:
$writer = new XMLWriter();
$writer->openMemory();
$writer->startDocument('1.0', 'ASCII');
$writer->writeElement("price", '€ 42.00');
$writer->endDocument();
echo $writer->outputMemory();
If you generate the XML as text (usually not the best choice), you will have to take care of the encoding yourself:
echo '<?xml version="1.0" encoding="UTF-8"?>', "
";
printf('<price>%s</price>', htmlentities('€ 42.00', ENT_XML1 | ENT_COMPAT, "UTF-8"));
Output:
<?xml version="1.0" encoding="UTF-8"?>
<price>€ 42.00</price>