I want to know how this regular expression is expand and how it validates proper E-mail address ?
"^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)↪*(\.[a-z]{2,3})$"
What is a valid email address? It's usually name@FQDN
Whereby FQDN is "fully qualified domain name" A FQDN must consist of one hostname and one top-level domain. It CAN have one or more optional subdomains.
^
start of match (see below)
[_a-z0-9-]+
the very first character of a valid email account name (required, hence the + qualifier 1..n)
(\.[_a-z0-9-]+)*
optional characters for a valid email account name (hence the * quantifier 0..n)
@
@-literal (delimiting account name from FQDN)
[a-z0-9-]+
character set for domain names (second-level domains / subdomain)
(\.[a-z0-9-]+)*
character set for domain names (second-level domains / subdomain)
(\.[a-z]{2,3})
character set for top level domains (.com, .net, etc. - won't be useful for the 'new' TLDs like .info, .business, .museum and so on
$
end of match
That ^ . . . $
is often used to declare that the whole string must consist of the pattern (and not just include it somewhere).
Actually, to validate a proper email address you need to use something like this:
(?:(?:
)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\\]|\\.|(?:(?:
)?[ \t]))*"(?:(?:
)?[ \t])*)(?:\.(?:(?:
)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
?:
)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\\]|\\.|(?:(?:
)?[
\t]))*"(?:(?:
)?[ \t])*))*@(?:(?:
)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
31]+(?:(?:(?:
)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\\]|\\.)*\
](?:(?:
)?[ \t])*)(?:\.(?:(?:
)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
(?:(?:(?:
)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\\]|\\.)*\](?:
(?:
)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\\]|\\.|(?:(?:
)?[ \t]))*"(?:(?:
)
?[ \t])*)*\<(?:(?:
)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
r
)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\\]|\\.)*\](?:(?:
)?[
\t])*)(?:\.(?:(?:
)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
)
?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\\]|\\.)*\](?:(?:
)?[ \t]
)*))*(?:,@(?:(?:
)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
)?[
\t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\\]|\\.)*\](?:(?:
)?[ \t])*
)(?:\.(?:(?:
)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\\]|\\.)*\](?:(?:
)?[ \t])*))*)
*:(?:(?:
)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
)?[ \t])+
|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\\]|\\.|(?:(?:
)?[ \t]))*"(?:(?:
)?[ \t])*)(?:\.(?:(?:
)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\\]|\\.|(?:(?:
)?[ \t
]))*"(?:(?:
)?[ \t])*))*@(?:(?:
)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
]+(?:(?:(?:
)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\\]|\\.)*\](
?:(?:
)?[ \t])*)(?:\.(?:(?:
)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
:(?:(?:
)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\\]|\\.)*\](?:(?
:
)?[ \t])*))*\>(?:(?:
)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
:(?:
)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\\]|\\.|(?:(?:
)?
[ \t]))*"(?:(?:
)?[ \t])*)*:(?:(?:
)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\]
\000-\031]+(?:(?:(?:
)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\\]|
\\.|(?:(?:
)?[ \t]))*"(?:(?:
)?[ \t])*)(?:\.(?:(?:
)?[ \t])*(?:[^()<>
@,;:\\".\[\] \000-\031]+(?:(?:(?:
)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
(?:[^\"\\]|\\.|(?:(?:
)?[ \t]))*"(?:(?:
)?[ \t])*))*@(?:(?:
)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\\]|\\.)*\](?:(?:
)?[ \t])*)(?:\.(?:(?:
)?[ \t])*(?
:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
\]]))|\[([^\[\]\\]|\\.)*\](?:(?:
)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
\031]+(?:(?:(?:
)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\\]|\\.|(
?:(?:
)?[ \t]))*"(?:(?:
)?[ \t])*)*\<(?:(?:
)?[ \t])*(?:@(?:[^()<>@,;
:\\".\[\] \000-\031]+(?:(?:(?:
)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
^\[\]\\]|\\.)*\](?:(?:
)?[ \t])*)(?:\.(?:(?:
)?[ \t])*(?:[^()<>@,;:\\"
.\[\] \000-\031]+(?:(?:(?:
)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
]\\]|\\.)*\](?:(?:
)?[ \t])*))*(?:,@(?:(?:
)?[ \t])*(?:[^()<>@,;:\\".\
[\] \000-\031]+(?:(?:(?:
)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
r\\]|\\.)*\](?:(?:
)?[ \t])*)(?:\.(?:(?:
)?[ \t])*(?:[^()<>@,;:\\".\[\]
\000-\031]+(?:(?:(?:
)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\\]
|\\.)*\](?:(?:
)?[ \t])*))*)*:(?:(?:
)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
00-\031]+(?:(?:(?:
)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\\]|\\
.|(?:(?:
)?[ \t]))*"(?:(?:
)?[ \t])*)(?:\.(?:(?:
)?[ \t])*(?:[^()<>@,
;:\\".\[\] \000-\031]+(?:(?:(?:
)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
:[^\"\\]|\\.|(?:(?:
)?[ \t]))*"(?:(?:
)?[ \t])*))*@(?:(?:
)?[ \t])*
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|\[([^\[\]\\]|\\.)*\](?:(?:
)?[ \t])*)(?:\.(?:(?:
)?[ \t])*(?:[
^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
]))|\[([^\[\]\\]|\\.)*\](?:(?:
)?[ \t])*))*\>(?:(?:
)?[ \t])*)(?:,\s*(
?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|"(?:[^\"\\]|\\.|(?:(?:
)?[ \t]))*"(?:(?:
)?[ \t])*)(?:\.(?:(
?:
)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
)?[ \t])+|\Z|(?=[
\["()<>@,;:\\".\[\]]))|"(?:[^\"\\]|\\.|(?:(?:
)?[ \t]))*"(?:(?:
)?[ \t
])*))*@(?:(?:
)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
)?[ \t
])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\\]|\\.)*\](?:(?:
)?[ \t])*)(?
:\.(?:(?:
)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
)?[ \t])+|
\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\\]|\\.)*\](?:(?:
)?[ \t])*))*|(?:
[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
]]))|"(?:[^\"\\]|\\.|(?:(?:
)?[ \t]))*"(?:(?:
)?[ \t])*)*\<(?:(?:
)
?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
)?[ \t])+|\Z|(?=[\["
()<>@,;:\\".\[\]]))|\[([^\[\]\\]|\\.)*\](?:(?:
)?[ \t])*)(?:\.(?:(?:
)
?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
)?[ \t])+|\Z|(?=[\["()<>
@,;:\\".\[\]]))|\[([^\[\]\\]|\\.)*\](?:(?:
)?[ \t])*))*(?:,@(?:(?:
)?[
\t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
)?[ \t])+|\Z|(?=[\["()<>@,
;:\\".\[\]]))|\[([^\[\]\\]|\\.)*\](?:(?:
)?[ \t])*)(?:\.(?:(?:
)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\\]|\\.)*\](?:(?:
)?[ \t])*))*)*:(?:(?:
)?[ \t])*)?
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|"(?:[^\"\\]|\\.|(?:(?:
)?[ \t]))*"(?:(?:
)?[ \t])*)(?:\.(?:(?:
)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
)?[ \t])+|\Z|(?=[\[
"()<>@,;:\\".\[\]]))|"(?:[^\"\\]|\\.|(?:(?:
)?[ \t]))*"(?:(?:
)?[ \t])
*))*@(?:(?:
)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
)?[ \t])
+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\\]|\\.)*\](?:(?:
)?[ \t])*)(?:\
.(?:(?:
)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\\]|\\.)*\](?:(?:
)?[ \t])*))*\>(?:(
?:
)?[ \t])*))*)?;\s*)
Source: http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html
This handles most of the complexities of the RFC. I recommend however checking into filter_var instead:
filter_var('bob@example.com', FILTER_VALIDATE_EMAIL);
Easier to read and understand.
EDIT
I got curious. filter_var
does email validation by regex in the backend as well. Here's the function that handles it, which can be found in ext/filters/logical_filter.c
in the PHP source:
void php_filter_validate_email(PHP_INPUT_FILTER_PARAM_DECL) /* {{{ */
{
/*
* The regex below is based on a regex by Michael Rushton.
* However, it is not identical. I changed it to only consider routeable
* addresses as valid. Michael's regex considers a@b a valid address
* which conflicts with section 2.3.5 of RFC 5321 which states that:
*
* Only resolvable, fully-qualified domain names (FQDNs) are permitted
* when domain names are used in SMTP. In other words, names that can
* be resolved to MX RRs or address (i.e., A or AAAA) RRs (as discussed
* in Section 5) are permitted, as are CNAME RRs whose targets can be
* resolved, in turn, to MX or address RRs. Local nicknames or
* unqualified names MUST NOT be used.
*
* This regex does not handle comments and folding whitespace. While
* this is technically valid in an email address, these parts aren't
* actually part of the address itself.
*
* Michael's regex carries this copyright:
*
* Copyright © Michael Rushton 2009-10
* http://squiloople.com/
* Feel free to use and redistribute this code. But please keep this copyright notice.
*
*/
const char regexp[] = "/^(?!(?:(?:\\x22?\\x5C[\\x00-\\x7E]\\x22?)|(?:\\x22?[^\\x5C\\x22]\\x22?)){255,})(?!(?:(?:\\x22?\\x5C[\\x00-\\x7E]\\x22?)|(?:\\x22?[^\\x5C\\x22]\\x22?)){65,}@)(?:(?:[\\x21\\x23-\\x27\\x2A\\x2B\\x2D\\x2F-\\x39\\x3D\\x3F\\x5E-\\x7E]+)|(?:\\x22(?:[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F\\x21\\x23-\\x5B\\x5D-\\x7F]|(?:\\x5C[\\x00-\\x7F]))*\\x22))(?:\\.(?:(?:[\\x21\\x23-\\x27\\x2A\\x2B\\x2D\\x2F-\\x39\\x3D\\x3F\\x5E-\\x7E]+)|(?:\\x22(?:[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F\\x21\\x23-\\x5B\\x5D-\\x7F]|(?:\\x5C[\\x00-\\x7F]))*\\x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-[a-z0-9]+)*\\.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-[a-z0-9]+)*)|(?:\\[(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9][:\\]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))\\]))$/iD";
pcre *re = NULL;
pcre_extra *pcre_extra = NULL;
int preg_options = 0;
int ovector[150]; /* Needs to be a multiple of 3 */
int matches;
/* The maximum length of an e-mail address is 320 octets, per RFC 2821. */
if (Z_STRLEN_P(value) > 320) {
RETURN_VALIDATION_FAILED
}
re = pcre_get_compiled_regex((char *)regexp, &pcre_extra, &preg_options TSRMLS_CC);
if (!re) {
RETURN_VALIDATION_FAILED
}
matches = pcre_exec(re, NULL, Z_STRVAL_P(value), Z_STRLEN_P(value), 0, 0, ovector, 3);
/* 0 means that the vector is too small to hold all the captured substring offsets */
if (matches < 0) {
RETURN_VALIDATION_FAILED
}
}
You can use http://xenon.stanford.edu/~xusch/regexp/analyzer.html to have it explained to you. Some more tools are listed under Is there an online RegexBuddy-like regular expression analyzer? and https://stackoverflow.com/questions/89718/is-there-anything-like-regexbuddy-in-the-open-source-world
Let me also just mention that your regex is insufficient. It fails on many valid addresses. See also http://www.regular-expressions.info/email.html
> I want to understand how that regular expression works not this @Viswanathan Iyer
^[_a-z0-9-]+ - address should be started with at least one letter or digit symbol
(.[_a-z0-9-]+)* - then address could contain some letters, digits, dots, '_' or '-'
@ - no comment
[a-z0-9-]+ - the part after @ should be started with at least one letter, digit symbol or '-' (why?)
(.[a-z0-9-]+)* - then address could contain some letters, digits, dots, '_' or '-'
(.[a-z]{2,3})$ - in the end of the string - dot before domain zone, than 2- or 3-letters domain (for example .net, .eu)
Note, that this pattern is incorrect: it doesn't allow uppercase, doesn't support new zones (such as .mobi) and non-latin encoding.
Note that if you're trying to match e-mail addresses with this, the regex is completely outdated (and has been outdated for ten years), Bad and Wrong. In other words, this is completely unsuitable for validating e-mail addresses, as it will incorrectly reject many valid addresses. See this and this for further discussion.
Here is what it does:
^
beginning of string[_a-z0-9-]+
1 or more characters from _abcdefghijklmnopqrstuvwxyz0123456789-(\.[_a-z0-9-]+)*
0 or more times:
\.
literal dot[_a-z0-9-]+
same as line 2
@
literal @[a-z0-9-]+
1 or more characters from abcdefghijklmnopqrstuvwxyz0123456789-(\.[_a-z0-9-]+)
exactly one time - same as line 3↪*
literal ↪, zero or more times (\.[a-z]{2,3})
exactly one time:
\.
literal dot[a-z]{2,3}
2 or 3 characters from abcdefghijklmnopqrstuvwxyz
$
: end of string