here's my problem: I want to check if a user insert a real name and surname by checking if they have only letters (of any alphabet) and ' or - in PHP. I've found a solution here (but I don't remember the link) on how to check if a string has only letters:
preg_match('/^[\p{L} ]+$/u',$name)
but I'd like to admit ' and - too. (Charset is UTF8) Can anyone help me please?
Looks like you just need to modify the regex: [\p{L}' -]+
(International) names can contain many characters: spaces, 's, dashes, normal letters, umlauts, accents, ...
EDIT: The point is: How to be sure all letters (of all languages), dash, ' and space are enough? Are there no names which contain a dot (What about "Dr. No"?), a colon or some char else?
EDIT2: Thanks to the user 'some' probably from Sweden (left a comment) we now know that there is an swedish name 'Andreas J:son Friberg'. Remember the colon!
This should also do it
/[\w'-]+/gi
Depending on the character set you want to permit, you'll just need to make sure that characters you want to support are inside the '[]' portion of the regex. Since the '-' character has special meaning in this context (it creates a range), it needs to be the last item in the list.
The \p{L} means match any character with the property of being a letter. \w has a similar meaning, but also includes the '_' character, which you probably don't want.
preg_match('/^[A-Za-z \'-]+$/i',$name);
Would match most common names, though if you want to support foreign character sets, you'll need more a exotic regex.
A little off-topic, but what exactly is the point of validating names?
It's not to prevent fraud; if people are trying to give you a fake name, they can easily type a string of random letters.
It's not to prevent mistakes; typing a punctuation character is only one of the many mistakes you could make, and an unlikely one at that.
It's not to prevent code injection; you should be preventing that by properly encoding your outputs, regardless of what characters they contain.
So why do we all do it?
if charset is UTF-8, then you have a problem - how are you able to check for Central and Eastern European Latin characters (diacritics) or names in Cyrillic, Chinese or Japanese names? that would be a hell of a regex.
Note that the example you provided does not check to ensure that the user has both a surname and given names, though I would argue that that is how it should be. You shouldn't assume a person has more than one name. I am currently working on a PHP application which deals with people's names in context, and if I have discovered anything it's that you cannot make such assumptions :) Even many non-celebrities have just one name.
Using the Unicode categories as in \p{L} was a good idea, as yes obviously people will have all sorts of characters from other languages in their names. However, as well as \p{L} you will also have to take into account combining marks - ie accents, umlauts etc that people add as extra characters.
So, maybe immediately after \p{L} I'd add \p{Mc}
I'd end up with
preg_match('/^[\pL\p{Mc} \'-]+$/u', $name)