Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
914 views
in Technique[技术] by (71.8m points)

python - LookupError: Resource 'corpora/stopwords' not found

I am trying to run a webapp on Heroku using Flask. The webapp is programmed in Python with the NLTK (Natural Language Toolkit library).

One of the file has the following header:

import nltk, json, operator
from nltk.corpus import stopwords 
from nltk.tokenize import RegexpTokenizer 

When the webpage with the stopwords code is called, it produces the following error:

LookupError: 
**********************************************************************
  Resource 'corpora/stopwords' not found.  Please use the NLTK  
  Downloader to obtain the resource:  >>> nltk.download()  
  Searched in:  
    - '/app/nltk_data'  
    - '/usr/share/nltk_data'  
    - '/usr/local/share/nltk_data'  
    - '/usr/lib/nltk_data'  
    - '/usr/local/lib/nltk_data'  
**********************************************************************

The exact code used:

#remove punctuation  
toker = RegexpTokenizer(r'((?<=[^ws])w(?=[^ws])|(W))+', gaps=True) 
data = toker.tokenize(data)  

#remove stop words and digits 
stopword = stopwords.words('english')  
data = [w for w in data if w not in stopword and not w.isdigit()]  

The webapp on Heroku doesn't produce the Lookup error when stopword = stopwords.words('english') is commented out.

The code runs without a glitch on my local computer. I have have installed the required libraries on my computer using

pip install requirements.txt  

The virtual environment provided by Heroku was running when I tested the code on my computer.

I have also tried the NLTK provided by two different sources, but the LookupError is still there. The two sources I used are:
http://pypi.python.org/packages/source/n/nltk/nltk-2.0.1rc4.zip
https://github.com/nltk/nltk.git

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The problem is that the corpus ('stopwords' in this case) doesn't get uploaded to Heroku. Your code works on your local machine because it already has the NLTK corpus. Please follow these steps to solve the issue.

  1. Create a new directory in your project (let's call it 'nltk_data')
  2. Download the NLTK corpus in that directory. You will have to configure that during the download.
  3. Tell nltk to look for this particular path. Just add nltk.data.path.append('path_to_nltk_data') to the Python file that's actually using nltk.
  4. Now push the app to Heroku.

Hope that solves the problem. Worked for me!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...