I am developing a regex to find sentences, and I would like to ignore abbreviations that cause the regex to terminate before the end of the sentence. For example, I want to ignore "a.m." so that it returns "At 9:00 a.m. the store opens." instead of "At 9:00 a.m."
def sentence_finder(x):
RegexObject = re.compile(r'[A-Z].+?(?!a.m.)w+[.?!](?!S)')
Variable = RegexObject.findall(x)
return Variable
I get back the following when I run pytest:
def test_pass_Ignore_am():
> assert DuplicateSentences.sentence_finder("At 9:00 a.m. the store opens.") == ["At 9:00 a.m. the store opens."]
E AssertionError: assert ['At 9:00 a.m.'] == ['At 9:00 a.m...store opens.']
E At index 0 diff: 'At 9:00 a.m.' != 'At 9:00 a.m. the store opens.'
What am I doing wrong?
question from:
https://stackoverflow.com/questions/65911590/python-regex-sentence-finder-want-to-ignore-a-m 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…