python - Slow performance of POS tagging. Can I do some kind of pre-warming?

Question

Welcome To Ask or Share your Answers For Others

python - Slow performance of POS tagging. Can I do some kind of pre-warming?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Slow performance of POS tagging. Can I do some kind of pre-warming?

I am using NLTK to POS-tag hundereds of tweets in a web request. As you know, Django instantiates a request handler for each request.

I noticed this: for a request (~200 tweets), the first tweet needs ~18 seconds to tag, while all subsequent tweets need ~120 milliseconds to tag. What can I do to speed up the process?

Can I do a "pre-warming request" so that the module data is already loaded for each request?

class MyRequestHandler(BaseHandler):
    def read(self, request): #this runs for a GET request
        #...in a loop:
            tokens = nltk.word_tokenize( tweet)
            tagged = nltk.pos_tag( tokens)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:57:53+0000

Those first 18 seconds are the POS tagger being unpickled from disk into RAM. If you want to get around this, load the tagger yourself outside of a request function.

import nltk.data, nltk.tag
tagger = nltk.data.load(nltk.tag._POS_TAGGER)

And then replace nltk.pos_tag with tagger.tag. The tradeoff is that app startup will now take +18seconds.

Categories

python - Slow performance of POS tagging. Can I do some kind of pre-warming?

python - Slow performance of POS tagging. Can I do some kind of pre-warming?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags