python - Finding Proper Nouns using NLTK WordNet

Question

Welcome To Ask or Share your Answers For Others

python - Finding Proper Nouns using NLTK WordNet

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:58:25+0000

I don't think you need WordNet to find proper nouns, I suggest using the Part-Of-Speech tagger pos_tag.

To find Proper Nouns, look for the NNP tag:

from nltk.tag import pos_tag

sentence = "Michael Jackson likes to eat at McDonalds"
tagged_sent = pos_tag(sentence.split())
# [('Michael', 'NNP'), ('Jackson', 'NNP'), ('likes', 'VBZ'), ('to', 'TO'), ('eat', 'VB'), ('at', 'IN'), ('McDonalds', 'NNP')]

propernouns = [word for word,pos in tagged_sent if pos == 'NNP']
# ['Michael','Jackson', 'McDonalds']

You may not be very satisfied since Michael and Jackson is split into 2 tokens, then you might need something more complex such as Name Entity tagger.

By right, as documented by the penntreebank tagset, for possessive nouns, you can simply look for the POS tag, http://www.mozart-oz.org/mogul/doc/lager/brill-tagger/penn.html. But often the tagger doesn't tag POS when it's an NNP.

To find Possessive Nouns, look for str.endswith("'s") or str.endswith("s'"):

from nltk.tag import pos_tag

sentence = "Michael Jackson took Daniel Jackson's hamburger and Agnes' fries"
tagged_sent = pos_tag(sentence.split())
# [('Michael', 'NNP'), ('Jackson', 'NNP'), ('took', 'VBD'), ('Daniel', 'NNP'), ("Jackson's", 'NNP'), ('hamburger', 'NN'), ('and', 'CC'), ("Agnes'", 'NNP'), ('fries', 'NNS')]

possessives = [word for word in sentence if word.endswith("'s") or word.endswith("s'")]
# ["Jackson's", "Agnes'"]

Alternatively, you can use NLTK ne_chunk but it doesn't seem to do much other unless you are concerned about what kind of Proper Noun you get from the sentence:

>>> from nltk.tree import Tree; from nltk.chunk import ne_chunk
>>> [chunk for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]
[Tree('PERSON', [('Michael', 'NNP')]), Tree('PERSON', [('Jackson', 'NNP')]), Tree('PERSON', [('Daniel', 'NNP')])]
>>> [i[0] for i in list(chain(*[chunk.leaves() for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]))]
['Michael', 'Jackson', 'Daniel']

Using ne_chunk is a little verbose and it doesn't get you the possessives.

Categories

python - Finding Proper Nouns using NLTK WordNet

python - Finding Proper Nouns using NLTK WordNet

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags