Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
120 views
in Technique[技术] by (71.8m points)

Mail::RFC822::Address Regex

According to the RFC-822 of mail address validation, there is a monster PERL-based Regular Expression with actually so many errors when I tried to use it in online regex testers:

(?:(?:
)?[ ])*(?:(?:(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ]
)+||(?=[["()<>@,;:".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:

)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(
?:
)?[ ])+||(?=[["()<>@,;:".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ 
]))*"(?:(?:
)?[ ])*))*@(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-
31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*
](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+
(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:
(?:
)?[ ])*))*|(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+|
|(?=[["()<>@,;:".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)
?[ ])*)*<(?:(?:
)?[ ])*(?:@(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
r
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[
 ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)
?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ]
)*))*(?:,@(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[
 ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*
)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ]
)+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*)
*:(?:(?:
)?[ ])*)?(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+
||(?=[["()<>@,;:".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:


)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:

)?[ ])+||(?=[["()<>@,;:".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ 
]))*"(?:(?:
)?[ ])*))*@(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31
]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](
?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?
:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?
:
)?[ ])*))*>(?:(?:
)?[ ])*)|(?:[^()<>@,;:".[] 00-31]+(?:(?
:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|"(?:[^"
\]|\.|(?:(?:
)?
[ ]))*"(?:(?:
)?[ ])*)*:(?:(?:
)?[ ])*(?:(?:(?:[^()<>@,;:".[] 
00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|"(?:[^"
\]|
\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>
@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|"
(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*))*@(?:(?:
)?[ ]
)*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\
".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?
:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[
]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*|(?:[^()<>@,;:".[] 00-
31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|"(?:[^"
\]|\.|(
?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)*<(?:(?:
)?[ ])*(?:@(?:[^()<>@,;
:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([
^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:"
.[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[
]
\]|\.)*](?:(?:
)?[ ])*))*(?:,@(?:(?:
)?[ ])*(?:[^()<>@,;:".
[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
r\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 
00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]
|\.)*](?:(?:
)?[ ])*))*)*:(?:(?:
)?[ ])*)?(?:[^()<>@,;:".[] 
00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|"(?:[^"
\]|\
.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,
;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|"(?
:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*))*@(?:(?:
)?[ ])*
(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".
[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[
^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]
]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*>(?:(?:
)?[ ])*)(?:,s*(
?:(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\
".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)(?:.(?:(
?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[
["()<>@,;:".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ 
])*))*@(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ 
])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?
:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+|
|(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*|(?:
[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[
]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)*<(?:(?:
)
?[ ])*(?:@(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["
()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)
?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>
@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*(?:,@(?:(?:
)?[
 ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,
;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ]
)*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:\
".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*)*:(?:(?:
)?[ ])*)?
(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".
[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)(?:.(?:(?:

)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[[
"()<>@,;:".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])
*))*@(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])
+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:
.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+|
|(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*>(?:(
?:
)?[ ])*))*)?;s*) 

The question is "why?"

Isn't any shorter Regular expression out there that does verify email addresses simple, accurate and yet fast? Like what is used in so many frameworks?

and as you see it's not there just for reading! It's used in a module and it means it's built for real world!

However there are so many cases in this expression that just made it long and long like [^[] \] which can be replaced by [^][ \]

I think that RFC, as an standard, should state the most acceptable things to all not within complexity and mind-blowing!

Update

Beside all words above, I should say that the online validator, that page refers to, tells revo@test&^%$#|.com is a valid email address however as RFC-1035 and we know, domain names

must start with a letter, end with a letter or digit, and have as interior characters only letters, digits, and hyphen. There are also some restrictions on the length.

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

According to the RFC-822 of mail address validation, there is a monster PERL-based Regular Expression with actually so many errors when I tried to use it in online regex testers

Well that's because it's "formatted" with newlines, you have to remove them. You could do that with a simple regex replace ? in your favorite editor.

(?:(?:
)?[ ])*(?:(?:(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*))*@(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*|(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)*<(?:(?:
)?[ ])*(?:@(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*(?:,@(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*)*:(?:(?:
)?[ ])*)?(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*))*@(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*>(?:(?:
)?[ ])*)|(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)*:(?:(?:
)?[ ])*(?:(?:(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*))*@(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*|(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)*<(?:(?:
)?[ ])*(?:@(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*(?:,@(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*)*:(?:(?:
)?[ ])*)?(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*))*@(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*>(?:(?:
)?[ ])*)(?:,s*(?:(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*))*@(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*|(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)*<(?:(?:
)?[ ])*(?:@(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*(?:,@(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*)*:(?:(?:
)?[ ])*)?(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|"(?:[^"
\]|\.|(?:(?:
)?[ ]))*"(?:(?:
)?[ ])*))*@(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*)(?:.(?:(?:
)?[ ])*(?:[^()<>@,;:".[] 00-31]+(?:(?:(?:
)?[ ])+||(?=[["()<>@,;:".[]]))|[([^[]
\]|\.)*](?:(?:
)?[ ])*))*>(?:(?:
)?[ ])*))*)?;s*)

And it works without errors !


Isn't any shorter Regular expression out there that doesn't verify email addresses simple, accurate and yet fast? Like what is used in so many frameworks ?

No, there isn't because the specification of email addresses is complex. If you want to make it simple, just check for S+@S+ and send an email verification already. That's the only reliable way of actually proving that the email is valid and that it actually exists. Because even if it's valid, you aren't certain that it exists.


However there are so many cases in this expression that just made it long and long like [^[] \] which can be replaced by [^][ \]

Well the author already said that it's generated and not handwritten and that it could contain bugs. Quoting from:

  • I did not write this regular expression by hand. It is generated by the Perl module by concatenating a simpler set of regular expressions that relate directly to the grammar defined in the RFC.
  • I do not maintain the regular expression below. There may be bugs in it that have already been fixed in the Perl module.

Moral of the story

Do not waste your time validating email addresses, and if you did just check if there is @ in the middle of it. Send an email to verify it.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...