Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
374 views
in Technique[技术] by (71.8m points)

python - ne_chunk without pos_tag in NLTK

I'm trying to chunk a sentence using ne_chunk and pos_tag in nltk.

from nltk import tag
from nltk.tag import pos_tag
from nltk.tree import Tree
from nltk.chunk import ne_chunk

sentence = "Michael and John is reading a booklet in a library of Jakarta"
tagged_sent = pos_tag(sentence.split())

print_chunk = [chunk for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]

print print_chunk

and this is the result:

[Tree('GPE', [('Michael', 'NNP')]), Tree('PERSON', [('John', 'NNP')]), Tree('GPE', [('Jakarta', 'NNP')])]

my question, is it possible not to include pos_tag (like NNP above) and only include Tree 'GPE','PERSON'? and what 'GPE' means?

Thanks in advance

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The named entity chunker will give you a tree containing both chunks and tags. You can't change that, but you can take the tags out. Starting from your tagged_sent:

chunks = nltk.ne_chunk(tagged_sent)
simple = []
for elt in chunks:
    if isinstance(elt, Tree):
        simple.append(Tree(elt.label(), [ word for word, tag in elt ]))
    else:
        simple.append( elt[0] )

If you only want the chunks, omit the else: clause in the above. You can adapt the code to wrap the chunks any way you want. I used an nltk Tree to keep the changes to a minimum. Note that some chunks consist of multiple words (try adding "New York" to your example), so the chunk's contents must be a list, not a single element.

PS. "GPE" stands for "geo-political entity" (obviously a chunker mistake). You can see a list of the "commonly used tags" in the nltk book, here.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...