正则表达式匹配除未定义的XML实体之外的所有内容

XML, unlike HTML, only knows four named entities: <, >, ' and ".

I have been using XMLWriter in PHP to write lots of data to an XML file, and first I escape the desired text, which gives me some other entities, such as  and ¤.

I have tried the following regex:

&(?!(apos|quot|[gl]t|amp);)

but it only matches the & and not  or &current;. What am I doing wrong?

If you add \w+; to your expression, it will work:

&(?!(?:apos|quot|[gl]t|amp);)\w+;

But you are better off using the correct escaping function from the beginning that doesn't give you these problems.

Could you not just use strip_tags() (with a list of allowed tags) instead of htmlentities()?

Do not escape the entities yourself. Let the XMLWriter do the needed escaping.

$writer= new XMLWriter;
$writer->openMemory();
$writer->startDocument('1.0', 'UTF-8');

$writer->startElement('root');
$writer->text('A & B & <C>');
$writer->endElement();

$writer->endDocument();
echo $writer->outputMemory(TRUE);

Output:

<?xml version="1.0" encoding="UTF-8"?>
<root>A &amp; B &amp; &lt;C&gt;</root>