Preg_replace和iso-8859-1字符匹配[关闭]

I have a problem with the preg_replace function.

In my code i will use it to delete all chars that are not letters or numbers at the beginning of a string.

This works well, but when the $string contains iso-8859-1 character like " è, ò, à, ù, é, ì " there are considered not words, and all were removed.

preg_replace('/^[^a-zA-Z0-9]+/', '', $string);

ie. If $string contains èxample the output will be xample.

I need that these iso-8859-1 vowels are included.

Anyone have a solution to this?

PCRE does not support Unicode blocks (in which case things would be much easier), so you really have no option other than to specify the set of allowed characters (or its negation) manually. The regex would look like

[^a-zA-Z0-9\xC0-\xFF]+

The problem is that the range \xC0-\xFF also includes undesirable characters (e.g. the division sign, \xF7), so you have to break it down into acceptable subranges depending on your requirements. Look at the codepage layout to help decide which characters are OK and which are not.

Try

$string = "1èxample";
$r = preg_replace('/^[^\p{L}\p{Nd}]+/', '', $string);

echo $r;

tested on http://writecodeonline.com/php/

output

1èxample

\p{L} is any letter in any language. So this matches any letters.

\p{Nd} is any digit in any language.

See Unicode Character Properties on regular-expressions.info for more details.