Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
623 views
in Technique[技术] by (71.8m points)

python - Word boundary with regex - cannot extract all words

I need extract double Male-Cat:

a = "Male-Cat Male-Cat Male-Cat-Female"
b = re.findall(r'(?:s|^)Male-Cat(?:s|$)', a)
print (b)
['Male-Cat ']

c = re.findall(r'Male-Cat', a)
print (c)
['Male-Cat', 'Male-Cat', 'Male-Cat']

I need extract tree times Male-Cat:

a = "Male-Cat Male-Cat Male-Cat"
b = re.findall(r'(?:s|^)Male-Cat(?:s|$)', a)
print (b)
['Male-Cat ', ' Male-Cat']

c = re.findall(r'Male-Cat', a)
print (c)
['Male-Cat', 'Male-Cat', 'Male-Cat']

Another strings which are parsed correctly by first way:

a = 'Male-Cat Female-Cat Male-Cat-Female Male-Cat'
a = 'Male-Cat-Female'
a = 'Male-Cat'

Something missing? Can you explain what is wrong and what is correct way?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use lookarounds to extract words inside whitespace boundaries:

r'(?<!S)Male-Cat(?!S)'

See the online regex demo

Details

  • (?<!S) - a whitespace or start of string must appear immediately to the left of the current location
  • Male-Cat - the term to search for
  • (?!S) - a whitespace or end of string must appear immediately to the right of the current location

Since (?<!S) and (?!S) are zero-width assertions, the whitespace won't be consumed, and consecutive matches will get found.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...