python - Chunking Stanford Named Entity Recognizer (NER) outputs from NLTK format

Question

Welcome To Ask or Share your Answers For Others

python - Chunking Stanford Named Entity Recognizer (NER) outputs from NLTK format

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Chunking Stanford Named Entity Recognizer (NER) outputs from NLTK format

I am using NER in NLTK to find persons, locations, and organizations in sentences. I am able to produce the results like this:

[(u'Remaking', u'O'), (u'The', u'O'), (u'Republican', u'ORGANIZATION'), (u'Party', u'ORGANIZATION')]

Is that possible to chunk things together by using it? What I want is like this:

u'Remaking'/ u'O', u'The'/u'O', (u'Republican', u'Party')/u'ORGANIZATION'

Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:10:22+0000

It looks long but it does the work:

ner_output = [(u'Remaking', u'O'), (u'The', u'O'), (u'Republican', u'ORGANIZATION'), (u'Party', u'ORGANIZATION')]
chunked, pos = [], ""
for i, word_pos in enumerate(ner_output):
    word, pos = word_pos
    if pos in ['PERSON', 'ORGANIZATION', 'LOCATION'] and pos == prev_tag:
        chunked[-1]+=word_pos
    else:
        chunked.append(word_pos)
    prev_tag = pos

clean_chunked = [tuple([" ".join(wordpos[::2]), wordpos[-1]]) if len(wordpos)!=2 else wordpos for wordpos in chunked]

print clean_chunked

[out]:

[(u'Remaking', u'O'), (u'The', u'O'), (u'Republican Party', u'ORGANIZATION')]

For more details:

The first for-loop "with memory" achieves something like this:

[(u'Remaking', u'O'), (u'The', u'O'), (u'Republican', u'ORGANIZATION', u'Party', u'ORGANIZATION')]

You'll realize that all Name Enitties will have more than 2 items in a tuple and what you want are the words as the elements in the list, i.e. 'Republican Party' in (u'Republican', u'ORGANIZATION', u'Party', u'ORGANIZATION'), so you'll do something like this to get the even elements:

>>> x = [0,1,2,3,4,5,6]
>>> x[::2]
[0, 2, 4, 6]
>>> x[1::2]
[1, 3, 5]

Then you also realized that the last element in the NE tuple is the tag you want, so you would do `

>>> x = (u'Republican', u'ORGANIZATION', u'Party', u'ORGANIZATION')
>>> x[::2]
(u'Republican', u'Party')
>>> x[-1]
u'ORGANIZATION'

It's a little ad-hoc and vebose but I hope it helps. And here it is in a function, Blessed Christmas:

ner_output = [(u'Remaking', u'O'), (u'The', u'O'), (u'Republican', u'ORGANIZATION'), (u'Party', u'ORGANIZATION')]


def rechunk(ner_output):
    chunked, pos = [], ""
    for i, word_pos in enumerate(ner_output):
        word, pos = word_pos
        if pos in ['PERSON', 'ORGANIZATION', 'LOCATION'] and pos == prev_tag:
            chunked[-1]+=word_pos
        else:
            chunked.append(word_pos)
        prev_tag = pos


    clean_chunked = [tuple([" ".join(wordpos[::2]), wordpos[-1]]) 
                    if len(wordpos)!=2 else wordpos for wordpos in chunked]

    return clean_chunked


print rechunk(ner_output)

Categories

python - Chunking Stanford Named Entity Recognizer (NER) outputs from NLTK format

python - Chunking Stanford Named Entity Recognizer (NER) outputs from NLTK format

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags