Follow up on this post I made earlier on.
I found out XML actually takes numeric codes instead of name codes when dealing with special characters. So I have looked through online on how to convert special characters into numberic codes, but I haven't got any lucks.
Do I have to write a function to do this task or does php come with any default function which can save up lots of works?
For instance, I want to convert á
to á
but not á
to á
Is it possible?
Please help if you have any ideas.
EDIT:
I am using this suggestion to convert the special chars into numberic chars,
$txt = preg_replace('/([\x80-\xff])/e', "'&#' . ord('$1') . ';'", $txt);
but I just found out that it does not convert these 5 special chars into numberic codes - <
, >
, &
, '
and "
.
How can I get around them?
Thanks.
The generic approach is to use:
$txt = preg_replace('/([\x80-\xff])/e', "'&#' . ord('$1') . ';'", $txt);
You must ensure that $txt does indeed contain Latin-1 already (utf8_decode
), because you'd otherwise receive the wrong value from the string byte.
A neat function is presented here http://www.sourcerally.net/Scripts/39-Convert-HTML-Entities-to-XML-Entities. You chain html_entities to the function presented to get text->html->xml
No, php has no built in function to date like xml_entities
Use mb_encode_numericentity
. Example (assuming the script is encoded in UTF-8):
<?php
header("Content-type: text/plain");
echo mb_encode_numericentity("aáb",
array(0x0080, 0x10FFFF, 0x0, 0xFFFFFF), "UTF-8");
would give:
aáb
This example encodes to their numeric entities all the characters that are not in ASCII. If you also want to encode the characters <
, >
, &
, '
and "
, which have special meaning in XML, use htmlspecialchars
(or use mb_encode_numericentity
, but adding those characters to the array in the second argument).
Note, however, that if your XML file is encoded in UTF-8, you only need to encode a few characters (á
is not one of them). See here for an appropriate conversion map to use in mb_encode_numericentity
(this includes the conversion of the XML special characters <
, >
, &
, '
and "
and also encodes characters that are forbidden to appear literally in a XML document, like U+0000).