正则表达式,用于检查表单中的有效输入

I am editing an existing project, and there was a bug being reported that the system doesn't accept if the numeric comes first in the input, like 99% Creative. But if you'll enter Creative 99%, the system accept it and save it in the database.

I check there existing code, and found out that they used this expression /^[\p{L}]+/u in preg_match. I googled over and found out to use \w but my senior wont accept my answer. He said its, unacceptable. This would give some issue.

Please check the script below, this is the one that i used:

$category= trim($_POST['category']);
if(preg_match('/\W/', $category)){
 $error='Invalid Input';
} 

He told me to test it, using some unicode character if it will accept like arabic texts, mixed (alphanumeric + arabic). It all works so it mean its fine but he still don't accept it. He told me to formulate another regex expression.

Do you have any idea what should i do? What could be the expression i can use. I really don't understand why \w is not acceptable.

The PHP manual on PCRE escape sequences says that in PCRE the characters matched by \w depend on the locale (and \W is the negation of \w).

This is unacceptable in a global environment, as you do not know the used server locale. What was the last setlocale() call? Is the site hosted in the U.S.A., in France, in China? You never know. And what might apply today, might not tomorrow. Users change, and sites do move.

For example, your senior's server might as well use a different locale than your server. So they could be right saying it does not work for them with certain characters that are not in their current locale.

That is why they cannot accept your solution. You need to use Unicode character properties instead.

\p{L} means any unicode letter and \w means [a-zA-Z0-9_].

You could use instead of \w :

[\p{L}\p{N}]+

That means letter or digit one or more times.

/^[\p{L}]+/u

means "match a string that starts with Unicode letters". It will match Creative in the string Creative 99%.

\w in PCRE regexes matches ASCII letters, digits and underscore.

If you add digits to your character class, also strings that start with digits will be accepted.

/^[\p{L}\p{N}]+/u

will match 99 in the string 99% Creative.