Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
256 views
in Technique[技术] by (71.8m points)

python - ModuleNotFoundError because PySpark serializer is not able to locate library folder

I have the following folder structure

 - libfolder
    - lib1.py
    - lib2.py
 - main.py

main.py calls libfolder.lib1.py which then calls libfolder.lib2.py and others.

It all works perfectly fine in local machine but after I deploy it to Dataproc I get the following error

File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 455, in loads
return pickle.loads(obj, encoding=encoding)
ModuleNotFoundError: No module named 'libfolder'

I have zipped the folder into xyz.zip and run the following command:

spark-submit --py-files=xyz.zip main.py

The serializer is not able to find the location for libfolder . Is there a problem with the way i am packaging my folders?

This issue is similar to this one but it's not answered.

Edit: response to Igor's questions

unzip -l for the zip file returns the following

 - libfolder
    - lib1.py
    - lib2.py
 - main.py

In main.py lib1.py is called with this import statement

from libfolder import lib1
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...