regex - negative lookahead assertion not working in python

Question

Welcome To Ask or Share your Answers For Others

regex - negative lookahead assertion not working in python

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

regex - negative lookahead assertion not working in python

Task:
- given: a list of images filenames
- todo: create a new list with filenames not containing the word "thumb" - i.e. only target the non-thumbnail images (with PIL - Python Imaging Library).

I've tried r".*(?!thumb).*" but it failed.

I've found the solution (here on stackoverflow) to prepend a ^ to the regex and to put the .* into the negative lookahead: r"^(?!.*thumb).*" and this now works.

The thing is, I would like to understand why my first solution did not work but I don't. Since regexes are complicated enough, I would really like to understand them.

What I do understand is that the ^ tells the parser that the following condition is to match at the beginning of the string. But doesn't the .* in the (not working) first example also start at the beginning of the string? I thought it would start at the beginning of the string and search through as many characters as it can before reaching "thumb". If so it would return a non-match.

Could someone please explain why r".*(?!thumb).*" does not work but r"^(?!.*thumb).*" does?

Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:18:12+0000

Could someone please explain why r".*(?!thumb).*" does not work but r"^(?!.*thumb).*" does?

The first will always match as the .* will consume all the string (so it can't be followed by anything for the negative lookahead to fail). The second is a bit convoluted and will match from the start of the line, the most amount of characters until it encounters 'thumb' and if that's present, then the entire match fails, as the line does begin with something followed by 'thumb'.

Number two is more easily written as:

'thumb' not in string
not re.search('thumb', string) (instead of match)

Also as I mentioned in the comments, your question says:

filenames not containing the word "thumb"

So you may wish to consider whether or not thumbs up is supposed to be excluded or not.

Categories

regex - negative lookahead assertion not working in python

regex - negative lookahead assertion not working in python

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags