Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.5k views
in Technique[技术] by (71.8m points)

python - What is the regex to match the words containing all the vowels?

I am learning regex in python but can't seem to get the hang of it. I am trying the filter out all the words containing all the vowels in english and this is my regex:

r'(S*[aeiou]){5}'

seems like it is too vague since any vowel(even repeated ones) can appear at any place and any number is times so this is throwing words like 'actionable', 'unfortunate' which do have count of vowels as 5 but not all the vowels. I looked around the internet and found this regex:

r'[^aeiou]*a[^aeiou]*e[^aeiou]*i[^aeiou]*o[^aeiou]*u[^aeiou]*

But as it appears, its only for the sequential appearance of the vowels, pretty limited task than the one I am trying to accomplish. Can someone 'think out loud' while crafting the regex for the problem that I have?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If you plan to match words as chunks of text only consisting of English letters you may use a regex like

(?=w*?a)(?=w*?e)(?=w*?i)(?=w*?o)(?=w*?u)[a-zA-Z]+

See the regex demo

To support languages other than English, you may replace [a-zA-Z]+ with [^Wd_]+.

If a "word" you want to match is a chunk of non-whitespace chars you may use

(?<!S)(?=S*?a)(?=S*?e)(?=S*?i)(?=S*?o)(?=S*?u)S+

See this regex demo.

Define these patterns in Python using raw string literals, e.g.:

rx_AllVowelWords = r'(?=w*?a)(?=w*?e)(?=w*?i)(?=w*?o)(?=w*?u)[a-zA-Z]+'

Details

  • (?=w*?a)(?=w*?e)(?=w*?i)(?=w*?o)(?=w*?u)[a-zA-Z]+:
    • - a word boundary, here, a starting word boundary
    • (?=w*?a)(?=w*?e)(?=w*?i)(?=w*?o)(?=w*?u) - a sequence of positive lookaheads that are triggered right after the word boundary position is detected, and require the presence of a, e, i, o and u after any 0+ word chars (letters, digits, underscores - you may replace w*? with [^Wd_]*? to only check letters)
    • [a-zA-Z]+ - 1 or more ASCII letters (replace with [^Wd_]+ to match all letters)
    • - a word boundary, here, a trailing word boundary

The second pattern details:

  • (?<!S)(?=S*?a)(?=S*?e)(?=S*?i)(?=S*?o)(?=S*?u)S+:
    • (?<!S) - a position at the start of the string or after a whitespace
    • (?=S*?a)(?=S*?e)(?=S*?i)(?=S*?o)(?=S*?u) - all English vowels must be present - in any order - after any 0+ chars other than whitespace
    • S+ - 1+ non-whitespace chars.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...