Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
760 views
in Technique[技术] by (71.8m points)

regex - extract all email addresses from some .txt documents using ruby

I have to extract all email addresses from some .txt documents. These emails may have these formats:

  1. [email protected]
  2. {a, b, c}@abc.edu
  3. some other formats including some @ signs.

I choose ruby for my first language to write this program, but i don't know how to write the regex. Would someone help me? Thank you!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Depending on the nature of your .txt documents, you don't have to use one of the complicated regexes that attempt to validate email addresses. You're not trying to validate anything. You're just trying to grab what's already there. Generally speaking, a regex to grab what's already there can be much simpler than a regex that needs to validate input.

An important question is whether your .txt documents contain @ signs that are not part of an email address you want to extract.

This regex handles your first two requirements:

w+@[w.-]+|{(?:w+, *)+w+}@[w.-]+

Or if you want to allow any sequence of non-space characters containing an @ sign, plus your second requirement (which has spaces):

S+@S+|{(?:w+, *)+w+}@[w.-]+

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...