python - LookupError: Resource 'corpora/stopwords' not found

Question

Welcome To Ask or Share your Answers For Others

python - LookupError: Resource 'corpora/stopwords' not found

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - LookupError: Resource 'corpora/stopwords' not found

I am trying to run a webapp on Heroku using Flask. The webapp is programmed in Python with the NLTK (Natural Language Toolkit library).

One of the file has the following header:

import nltk, json, operator
from nltk.corpus import stopwords 
from nltk.tokenize import RegexpTokenizer

When the webpage with the stopwords code is called, it produces the following error:

LookupError: 
**********************************************************************
  Resource 'corpora/stopwords' not found.  Please use the NLTK  
  Downloader to obtain the resource:  >>> nltk.download()  
  Searched in:  
    - '/app/nltk_data'  
    - '/usr/share/nltk_data'  
    - '/usr/local/share/nltk_data'  
    - '/usr/lib/nltk_data'  
    - '/usr/local/lib/nltk_data'  
**********************************************************************

The exact code used:

#remove punctuation  
toker = RegexpTokenizer(r'((?<=[^ws])w(?=[^ws])|(W))+', gaps=True) 
data = toker.tokenize(data)  

#remove stop words and digits 
stopword = stopwords.words('english')  
data = [w for w in data if w not in stopword and not w.isdigit()]

The webapp on Heroku doesn't produce the Lookup error when stopword = stopwords.words('english') is commented out.

The code runs without a glitch on my local computer. I have have installed the required libraries on my computer using

pip install requirements.txt

The virtual environment provided by Heroku was running when I tested the code on my computer.

I have also tried the NLTK provided by two different sources, but the LookupError is still there. The two sources I used are:
http://pypi.python.org/packages/source/n/nltk/nltk-2.0.1rc4.zip
https://github.com/nltk/nltk.git

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T21:28:48+0000

The problem is that the corpus ('stopwords' in this case) doesn't get uploaded to Heroku. Your code works on your local machine because it already has the NLTK corpus. Please follow these steps to solve the issue.

Create a new directory in your project (let's call it 'nltk_data')
Download the NLTK corpus in that directory. You will have to configure that during the download.
Tell nltk to look for this particular path. Just add nltk.data.path.append('path_to_nltk_data') to the Python file that's actually using nltk.
Now push the app to Heroku.

Hope that solves the problem. Worked for me!

Categories

python - LookupError: Resource 'corpora/stopwords' not found

python - LookupError: Resource 'corpora/stopwords' not found

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags