使用php将特殊字符转换为ISO Latin-1代码?

Follow up on this post I made earlier on.

I found out XML actually takes numeric codes instead of name codes when dealing with special characters. So I have looked through online on how to convert special characters into numberic codes, but I haven't got any lucks.

Do I have to write a function to do this task or does php come with any default function which can save up lots of works?

For instance, I want to convert á to á but not á to á

Is it possible?

Please help if you have any ideas.

EDIT:

I am using this suggestion to convert the special chars into numberic chars,

$txt = preg_replace('/([\x80-\xff])/e', "'&#' . ord('$1') . ';'", $txt);

but I just found out that it does not convert these 5 special chars into numberic codes - <, >, &, ' and ".

How can I get around them?

Thanks.

The generic approach is to use:

$txt = preg_replace('/([\x80-\xff])/e', "'&#' . ord('$1') . ';'", $txt);

You must ensure that $txt does indeed contain Latin-1 already (utf8_decode), because you'd otherwise receive the wrong value from the string byte.

A neat function is presented here http://www.sourcerally.net/Scripts/39-Convert-HTML-Entities-to-XML-Entities. You chain html_entities to the function presented to get text->html->xml

No, php has no built in function to date like xml_entities

Use mb_encode_numericentity. Example (assuming the script is encoded in UTF-8):

<?php
header("Content-type: text/plain");
echo mb_encode_numericentity("aáb",
    array(0x0080, 0x10FFFF, 0x0, 0xFFFFFF), "UTF-8");

would give:

a&#225;b

This example encodes to their numeric entities all the characters that are not in ASCII. If you also want to encode the characters <, >, &, ' and ", which have special meaning in XML, use htmlspecialchars (or use mb_encode_numericentity, but adding those characters to the array in the second argument).

Note, however, that if your XML file is encoded in UTF-8, you only need to encode a few characters (á is not one of them). See here for an appropriate conversion map to use in mb_encode_numericentity (this includes the conversion of the XML special characters <, >, &, ' and " and also encodes characters that are forbidden to appear literally in a XML document, like U+0000).