I am trying to run a webapp on Heroku using Flask. The webapp is programmed in Python with the NLTK (Natural Language Toolkit library).
One of the file has the following header:
import nltk, json, operator
from nltk.corpus import stopwords
from nltk.tokenize import RegexpTokenizer
When the webpage with the stopwords code is called, it produces the following error:
LookupError:
**********************************************************************
Resource 'corpora/stopwords' not found. Please use the NLTK
Downloader to obtain the resource: >>> nltk.download()
Searched in:
- '/app/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
**********************************************************************
The exact code used:
#remove punctuation
toker = RegexpTokenizer(r'((?<=[^ws])w(?=[^ws])|(W))+', gaps=True)
data = toker.tokenize(data)
#remove stop words and digits
stopword = stopwords.words('english')
data = [w for w in data if w not in stopword and not w.isdigit()]
The webapp on Heroku doesn't produce the Lookup error when stopword = stopwords.words('english')
is commented out.
The code runs without a glitch on my local computer. I have have installed the required libraries on my computer using
pip install requirements.txt
The virtual environment provided by Heroku was running when I tested the code on my computer.
I have also tried the NLTK provided by two different sources, but the LookupError
is still there. The two sources I used are:
http://pypi.python.org/packages/source/n/nltk/nltk-2.0.1rc4.zip
https://github.com/nltk/nltk.git
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…