I have the following folder structure
- libfolder
- lib1.py
- lib2.py
- main.py
main.py
calls libfolder.lib1.py
which then calls libfolder.lib2.py
and others.
It all works perfectly fine in local machine but after I deploy it to Dataproc I get the following error
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 455, in loads
return pickle.loads(obj, encoding=encoding)
ModuleNotFoundError: No module named 'libfolder'
I have zipped the folder into xyz.zip
and run the following command:
spark-submit --py-files=xyz.zip main.py
The serializer is not able to find the location for libfolder
. Is there a problem with the way i am packaging my folders?
This issue is similar to this one but it's not answered.
Edit: response to Igor's questions
unzip -l for the zip file returns the following
- libfolder
- lib1.py
- lib2.py
- main.py
In main.py lib1.py is called with this import statement
from libfolder import lib1
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…