Python Regex Sentence Finder-Want to Ignore "a.m."

Question

Welcome To Ask or Share your Answers For Others

Python Regex Sentence Finder-Want to Ignore "a.m."

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

Python Regex Sentence Finder-Want to Ignore "a.m."

I am developing a regex to find sentences, and I would like to ignore abbreviations that cause the regex to terminate before the end of the sentence. For example, I want to ignore "a.m." so that it returns "At 9:00 a.m. the store opens." instead of "At 9:00 a.m."

def sentence_finder(x):
    RegexObject = re.compile(r'[A-Z].+?(?!a.m.)w+[.?!](?!S)')
    Variable = RegexObject.findall(x)
    return Variable

I get back the following when I run pytest:

def test_pass_Ignore_am():
>       assert DuplicateSentences.sentence_finder("At 9:00 a.m. the store opens.") == ["At 9:00 a.m. the store opens."]
E       AssertionError: assert ['At 9:00 a.m.'] == ['At 9:00 a.m...store opens.']
E         At index 0 diff: 'At 9:00 a.m.' != 'At 9:00 a.m. the store opens.'

What am I doing wrong?

question from:https://stackoverflow.com/questions/65911590/python-regex-sentence-finder-want-to-ignore-a-m

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:11:08+0000

You could use a negative lookbehind to check that after matching a dot, there is not a.m. before it.

[A-Z].*?w[.?!](?<!a.m.)(?!S)

Explanation

[A-Z] Match a char A-Z
.*? Match 0+ times any char except a newline as least as possible
w[.?!] Match a word char followed by either . ? or !
(?<!a.m.) Negative lookbehind to assert that directly to the left is not a.m.
(?!S) Assert a whitespace boundary to the right

Regex demo

Categories

Python Regex Sentence Finder-Want to Ignore "a.m."

Python Regex Sentence Finder-Want to Ignore "a.m."

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags