在php中替换multibyte utf8字符

I am trying to preg_replace the multibytecharacter for euro in UTF (shown as ⬠in my html) to a "$" and the * for an "@"

$orig = "2 **** reviews  ⬠19,99 price";
$orig = mb_ereg_replace(mb_convert_encoding('€', 'UTF-8', 'HTML-ENTITIES'), "$", $orig);
$orig = preg_replace("/[\$\;\?\!\{\}\(\)\[\]\/\*\>\<]/", "@", $orig);
$a = htmlentities($orig);
$b = html_entity_decode($a);

The "*" are being replaced but not the "â¬" .......

Also tried to replace it with

$orig = preg_replace("/[\xe2\x82\xac]/", "$", $orig);

Doesn't convert either....

Another plan which didnt work:

$orig= mb_ereg_replace(mb_convert_encoding('&#x20ac;', 'UTF-8', 'HTML-ENTITIES'), "$", $orig);

Brrr someone knows how to get rid of this utf8 euro character:

echo html_entity_decode('&euro;');

(driving me nuts)

Pasting my comment here as an answer so you can mark it!

Wouldn't

str_replace(html_entity_decode('&euro;'), '$', $source)

work?

This could be caused by two reasons:

  1. The actual source text is UTF8 encoded, but your PHP code not. You can solve this by just using this line and save your file UTF8 encoded (try using notepad++).

    str_replace('€', '$', $source);

  2. The source text is corrupted: multibyte characters are converted to latin1 (wrong database charset?). You can try to convert them back to latin1:

    str_replace('€', '$', utf8_decode($source))

In your $orig string you do not have euro sign. When I run this php file:

<?php
$orig = "â¬";
for($i=0; $i<strlen($orig); $i++)
    echo '0x' . dechex(ord($orig{$i})) . ' ';
?>

If saved as utf-8 I get: 0xc3 0xa2 0xc2 0xac

If saved as latin-1 I get: 0xe2 0xac

In any case it is not € sign which is:0xE2 0x82 0xAC or unicode \u20AC ( http://www.fileformat.info/info/unicode/char/20ac/index.htm ). 0x82 is missing!!!!!

Run this program above, see what do you get and use this hex values to get rid of â¬.

For real sign this works:

<?php
    $orig = html_entity_decode('&euro;', ENT_COMPAT, 'UTF-8');
    $dest = preg_replace('~\x{20ac}~u', '$', $orig);

    echo "($orig) ($dest)";
?>

BTW if UTF-8 file containing € is displayed as latin-1 you should get: € and not ⬠as in your example.

So in fact, you have problems with encoding and conversion between encodings. If you try to save € in latin1 middle character will be lost (for example my Komodo will alert me and then replace with ?). In other words, you somehow damaged your € sign - and then you tried to replace it as it was complete. :D