php iconv translit删除重音：不能作为例外工作？

consider this simple code:

echo iconv('UTF-8', 'ASCII//TRANSLIT', 'è');

it prints

`e

instead of just

do you know what I am doing wrong?

nothing changed after adding setlocale

setlocale(LC_COLLATE, 'en_US.utf8');
echo iconv('UTF-8', 'ASCII//TRANSLIT', 'è');

I have this standard function to return valid url strings without the invalid url characters. The magic seems to be in the line after the //remove unwanted characters comment.

This is taken from the Symfony framework documentation: http://www.symfony-project.org/jobeet/1_4/Doctrine/en/08 which in turn is taken from http://php.vrana.cz/vytvoreni-pratelskeho-url.php but i don't speak Czech ;-)

function slugify($text)
{
  // replace non letter or digits by -
  $text = preg_replace('#[^\\pL\d]+#u', '-', $text);

  // trim
  $text = trim($text, '-');

  // transliterate
  if (function_exists('iconv'))
  {
    $text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
  }

  // lowercase
  $text = strtolower($text);

  // remove unwanted characters
  $text = preg_replace('#[^-\w]+#', '', $text);

  if (empty($text))
  {
    return 'n-a';
  }

  return $text;
}

echo slugify('é'); // --> "e"

When doing transliteration, you have to make sure that your LC_COLLATE is properly set, otherwise the default POSIX will be used.

Look at http://uk3.php.net/manual/en/function.setlocale.php

I'm tempted to say "nothing", although this is a little outside my expertise. PHP's iconv() is notorious, and the inspiration for many workarounds, including

dropping to the system's iconv utility (Unix & Linux)
crafting a lookup table
replacing all accented characters with an ASCII equivalent as kind of a preprocessing stage
setting LC_COLLATE (which doesn't seem to work for everyone)
use htmlentities() instead of iconv()

Read the comments for iconv() documentation for more inspiration. (Or commiseration. Too close to call.)

It seems the standard way to handle this is with a "removing accents" function which you can find in library's like flourish or CMS's like Wordpress. Iconv seems to be unable to translate accents (and rightly so) since this isn't a good idea for anything other than URL slugs.

cf @tchrist, with INTL php extension

http://fr2.php.net/manual/en/book.intl.php

preg_replace('/\pM*/u','',normalizer_normalize( $mystring, Normalizer::FORM_D));

eéèêëiîïoöôuùûüaâäÅ Ἥ ŐǟǠ ǺƶƈƉųŪŧȬƀ␢ĦŁȽŦ ƀǖ becomes

eeeeeiiiooouuuuaaaA Η OaA AƶƈƉuUŧOƀ␢ĦŁȽŦ ƀu

As tchrist emphasises, not all unicode characters are considered decomposable:

extract from Unicode charts:

U0080.pdf

00CF Ï LATIN CAPITAL LETTER I WITH DIAERESIS
≡ 0049 I 0308 ¨
NB this symbol « ≡ » indicate an available decomposition
00D0 Ð LATIN CAPITAL LETTER ETH
→ 00F0 ð latin small letter eth
→ 0110 Đ latin capital letter d with stroke
→ 0189 Ɖ latin capital letter african d

no decomposition available, IMHO strangely (we could consider ASCII letter D as an acceptable equivalent).

U0100.pdf

0110 Đ LATIN CAPITAL LETTER D WITH STROKE
→ 00D0 Ð latin capital letter eth
→ 0111 đ latin small letter d with stroke
→ 0189 Ɖ latin capital letter african d

even stranger: this one is identified as LATIN CAPITAL LETTER D (with stroke), but not decomposable as such! Perhaps a cooler solution should be to get the unicode description of each char, and compare it with the description of each ascii char (and replace accordingly). Anyone? ;-]

cf http://unicode.org/Public/UNIDATA/UnicodeData.txt

It happen with me with pure iconv without php. The Trick was to set LANG environment value to en_US.UTF-8 (it was hu_HU.UTF-8 before, in my case). After it worked as expected.

It seem that it depend of the php version...

TestCase #1

php -version

PHP 7.0.0RC8 (cli) (built: Nov 25 2015 12:36:50) ( NTS ) Copyright (c) 1997-2015 The PHP Group Zend Engine v3.0.0, Copyright (c) 1998-2015 Zend Technologies with Zend OPcache v7.0.6-dev, Copyright (c) 1999-2015, by Zend Technologies

php -r "var_dump(iconv('UTF-8', 'ASCII//TRANSLIT', 'è'));"

string(2) "`e"

TestCase #2

php -version

PHP 7.0.8-1~dotdeb+8.1 (cli) ( NTS ) Copyright (c) 1997-2016 The PHP Group Zend Engine v3.0.0, Copyright (c) 1998-2016 Zend Technologies with Zend OPcache v7.0.8-1~dotdeb+8.1, Copyright (c) 1999-2016, by Zend Technologies

php -r "var_dump(iconv('UTF-8', 'ASCII//TRANSLIT', 'è'));"

string(1) "e"