I am trying to create a matcher that finds negated custom entities in the text. It is working fine for entities that span a single token, but I am having trouble trying to capture entities that span more than one token.
As an example, let's say that my custom entities are animals (and are labeled as token.ent_type_ = "animal"
)
["cat", "dog", "artic fox"]
(note that the last entity has two words).
Now I want to find those entities in the text but negated, so I can create a simple matcher with the following pattern:
[{'lower': 'no'}, {'ENT_TYPE': {'REGEX': 'animal', 'OP': '+'}}]
And for example, I have the following text:
There is no cat in the house and no artic fox in the basement
I can successfully capture no cat
and no artic
, but the last match is incorrect as the full match should be no artic fox
. This is due to the OP: '+'
in the pattern that matches a single custom entity instead of two. How can I modify the pattern to prioritize longer matches over shorter ones?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…