Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
692 views
in Technique[技术] by (71.8m points)

python - Regex: Why do empty strings get included (in a list of tuples) in re.findall()?

According to the pattern match here, the matches are 213.239.250.131 and 014.10.26.06.

Yet when I run the generated Python code and print out the value of re.findall(p, test_str), I get:

[('', '', '213.239.250.131'), ('', '', '014.10.26.06')]

I could hack around the list and it tuples to get the values I'm looking for (the IP addresses), but (i) they might not always be in the same position in the tuples and (ii) I'd rather understand what's going on here so I can either tighten up the regex, or extract only IP addresses using Python's own re functionality.

Why do I get this list of tuples, why the apparent whitespace matches, and how do we ensure that only the IP addresses are returned?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Whenever you are using a capturing group, it always returns a submatch, even if it is empty/null. You have 3 capturing groups, so you will always have them in the findall result.

In regex101.com, you can see these non-participating groups by turning them on in Options:

enter image description here

You may tighten up your regex by removing capturing groups:

(?:[a-z0-9]{1,4}:+){3,5}[a-z0-9]{1,4}|d{1,3}.d{1,3}.d{1,3}.d{1,3}

Or even (?:[a-z0-9]{1,4}:+){3,5}[a-z0-9]{1,4}|d{1,3}(?:.d{1,3}){3}.

See a regex demo

And since the regex pattern does not contain capturing groups, re.findall will only return matches, not capturing group contents:

import re
p = re.compile(r'(?:[a-z0-9]{1,4}:+){3,5}[a-z0-9]{1,4}|d{1,3}.d{1,3}.d{1,3}.d{1,3}')
test_str = "from mail.example.com (example.com. [213.239.250.131]) by
 mx.google.com with ESMTPS id xc4si15480310lbb.82.2014.10.26.06.16.58 for
 <[email protected]> (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256
 bits=128/128); Sun, 26 Oct 2014 06:16:58 -0700 (PDT)"
print(re.findall(p, test_str))

Output of the online Python demo:

['213.239.250.131', '014.10.26.06']

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...