Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
548 views
in Technique[技术] by (71.8m points)

python - Can't find SpaCy model when packaging with PyInstaller

I am using PyInstaller package a python script into an .exe. This script is using spacy to load up the following model: en_core_web_sm. I have already run python -m spacy download en_core_web_sm to download the model locally. The issue is when PyInstaller tries to package up my script it can't find the model. I get the following error: Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory. I thought maybe this meant that I needed to run the download command in my python script in order to make sure it has the model, but if I have my script download the model it just says the requirements are already satisfied. I also have a hook file that handles bringing in hidden imports and is supposed to bring in the model as well:

from PyInstaller.utils.hooks import collect_all, collect_data_files

datas = []
datas.extend(collect_data_files('en_core_web_sm'))

# ----------------------------- SPACY -----------------------------
data = collect_all('spacy')

datas.extend(data[0])
binaries = data[1]
hiddenimports = data[2]

# ----------------------------- THINC -----------------------------
data = collect_all('thinc')

datas.extend(data[0])
binaries += data[1]
hiddenimports += data[2]

# ----------------------------- CYMEM -----------------------------
data = collect_all('cymem')

datas.extend(data[0])
binaries += data[1]
hiddenimports += data[2]

# ----------------------------- PRESHED -----------------------------
data = collect_all('preshed')

datas.extend(data[0])
binaries += data[1]
hiddenimports += data[2]

# ----------------------------- BLIS -----------------------------

data = collect_all('blis')

datas.extend(data[0])
binaries += data[1]
hiddenimports += data[2]

# ----------------------------- STDNUM -----------------------------

data = collect_all('stdnum')

datas.extend(data[0])
binaries += data[1]
hiddenimports += data[2]

# ----------------------------- OTHER -------------------------------

hiddenimports += ['srsly.msgpack.util']

I use the following code to download the model and then to package the script with PyInstaller:

os.system('python -m spacy download en_core_web_sm')
PyInstaller.__main__.run([path_to_script, '--onefile', '--additional-hooks-dir=.'])

The hook-spacy.py script is in the same directory as the script that is being packaged by PyInstaller.

All of this works if I run the script locally. It finds the model as it should. I only get this error if I try to package the script with PyInstaller and try to run the .exe.

I am using Python v3.8.7, PyInstaller v4.2, and spacy v3.0.3 with en_core_web_sm v3.0.0

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

When you use PyInstaller to collect data files into the bundle as you are doing here, the files are actually compiled into the resulting exe itself. This is transparently handled for Python code by PyInstaller when import statements are evaluated.

However, for data files you must handle this yourself. For instance, spacy is likely looking for the model in the current working directory. It won’t find your model because it is compiled into the .exe instead and therefore isn’t present in the current working directory.

You will need to use this API:

https://pyinstaller.readthedocs.io/en/stable/spec-files.html#using-data-files-from-a-module

This allows you to read a data file from the exe that PyInstaller creates. You can then write it to the current working directory and then spacy should be able to find it.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...