python - Resource 'corpora/wordnet' not found on Heroku

Question

Welcome To Ask or Share your Answers For Others

python - Resource 'corpora/wordnet' not found on Heroku

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Resource 'corpora/wordnet' not found on Heroku

I'm trying to get NLTK and wordnet working on Heroku. I've already done

heroku run python
nltk.download()
  wordnet
pip install -r requirements.txt

But I get this error:

Resource 'corpora/wordnet' not found.  Please use the NLTK
  Downloader to obtain the resource:  >>> nltk.download()
  Searched in:
    - '/app/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'

Yet, I've looked at in /app/nltk_data and it's there, so I'm not sure what's going on.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:47:31+0000

I just had this same problem. What ended up working for me is creating an 'nltk_data' directory in the application's folder itself, downloading the corpus to that directory and adding a line to my code that lets the nltk know to look in that directory. You can do this all locally and then push the changes to Heroku.

So, supposing my python application is in a directory called "myapp/"

Step 1: Create the directory

cd myapp/
mkdir nltk_data

Step 2: Download Corpus to New Directory

python -m nltk.downloader

This'll pop up the nltk downloader. Set your Download Directory to whatever_the_absolute_path_to_myapp_is/nltk_data/. If you're using the GUI downloader, the download directory is set through a text field on the bottom of the UI. If you're using the command line one, you set it in the config menu.

Once the downloader knows to point to your newly created nltk_data directory, download your corpus.

Or in one step from Python code:

nltk.download("wordnet", "whatever_the_absolute_path_to_myapp_is/nltk_data/")

Step 3: Let nltk Know Where to Look

ntlk looks for data,resources,etc. in the locations specified in the nltk.data.path variable. All you need to do is add nltk.data.path.append('./nltk_data/') to the python file actually using nltk, and it will look for corpora, tokenizers, and such in there in addition to the default paths.

Step 4: Send it to Heroku

git add nltk_data/
git commit -m 'super useful commit message'
git push heroku master

That should work! It did for me anyway. One thing worth noting is that the path from the python file executing nltk stuff to the nltk_data directory may be different depending on how you've structured your application, so just account for that when you do nltk.data.path.append('path_to_nltk_data')

Categories

python - Resource 'corpora/wordnet' not found on Heroku

python - Resource 'corpora/wordnet' not found on Heroku

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags