如何检查真实姓名和姓氏 - PHP

here's my problem: I want to check if a user insert a real name and surname by checking if they have only letters (of any alphabet) and ' or - in PHP. I've found a solution here (but I don't remember the link) on how to check if a string has only letters:

preg_match('/^[\p{L} ]+$/u',$name)

but I'd like to admit ' and - too. (Charset is UTF8) Can anyone help me please?

Looks like you just need to modify the regex: [\p{L}' -]+

(International) names can contain many characters: spaces, 's, dashes, normal letters, umlauts, accents, ...

EDIT: The point is: How to be sure all letters (of all languages), dash, ' and space are enough? Are there no names which contain a dot (What about "Dr. No"?), a colon or some char else?

EDIT2: Thanks to the user 'some' probably from Sweden (left a comment) we now know that there is an swedish name 'Andreas J:son Friberg'. Remember the colon!

This should also do it

/[\w'-]+/gi

Depending on the character set you want to permit, you'll just need to make sure that characters you want to support are inside the '[]' portion of the regex. Since the '-' character has special meaning in this context (it creates a range), it needs to be the last item in the list.

The \p{L} means match any character with the property of being a letter. \w has a similar meaning, but also includes the '_' character, which you probably don't want.

preg_match('/^[A-Za-z \'-]+$/i',$name);

Would match most common names, though if you want to support foreign character sets, you'll need more a exotic regex.

A little off-topic, but what exactly is the point of validating names?

It's not to prevent fraud; if people are trying to give you a fake name, they can easily type a string of random letters.

It's not to prevent mistakes; typing a punctuation character is only one of the many mistakes you could make, and an unlikely one at that.

It's not to prevent code injection; you should be preventing that by properly encoding your outputs, regardless of what characters they contain.

So why do we all do it?

if charset is UTF-8, then you have a problem - how are you able to check for Central and Eastern European Latin characters (diacritics) or names in Cyrillic, Chinese or Japanese names? that would be a hell of a regex.

Note that the example you provided does not check to ensure that the user has both a surname and given names, though I would argue that that is how it should be. You shouldn't assume a person has more than one name. I am currently working on a PHP application which deals with people's names in context, and if I have discovered anything it's that you cannot make such assumptions :) Even many non-celebrities have just one name.

Using the Unicode categories as in \p{L} was a good idea, as yes obviously people will have all sorts of characters from other languages in their names. However, as well as \p{L} you will also have to take into account combining marks - ie accents, umlauts etc that people add as extra characters.

So, maybe immediately after \p{L} I'd add \p{Mc}

I'd end up with

preg_match('/^[\pL\p{Mc} \'-]+$/u', $name)