This code should work (I could only test the regex part though):
import sys, os, openpyxl
def sort_email_from_xl():
sheet = sheet_select() #Opens the worksheet
emailRegex = re.compile(".*?([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,4}).*?")
customeremails = []
for row in range(0, max_row):
if emailRegex.match(cell.text):
mail = emailRegex.match(cell.text).groups()[0]
cell.text = mail
customeremails.append(mail)
print(customeremails)
There were many problems with your code. First about the regex:
- the regex was not allowing text around your email address, added that with
.*?
at start and end
- you don't need the
re.VERBOSE
part as you'd only need it if you want to add inline comments to your regex, see doc
- you allowed email addresses with many
@
in between
- you matched the TLD separately, that's unneeded
Now, the email regex works for basic usage, but I'd definitively recommend to take a proven email regex from other answers on Stackoverflow.
Then: with emailRegex.match(cell.text)
you can check if the cell.text
matches your regex and with emailRegex.match(cell.text).groups()[0]
you extract only the matching part. You had one return
statement too much as well.
For some reason the above code is giving me a NameError: name 'max_row' is not defined
You need to correct the looping through the rows e.g. like documented here
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…