Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.3k views
in Technique[技术] by (71.8m points)

python - findspark.init() IndexError: list index out of range error

When running the following in a Python 3.5 Jupyter environment I get the error below. Any ideas on what is causing it?

import findspark
findspark.init()

Error:

IndexError                                Traceback (most recent call
last) <ipython-input-20-2ad2c7679ebc> in <module>()
      1 import findspark
----> 2 findspark.init()
      3 
      4 import pyspark

/.../anaconda/envs/pyspark/lib/python3.5/site-packages/findspark.py in init(spark_home, python_path, edit_rc, edit_profile)
    132     # add pyspark to sys.path
    133     spark_python = os.path.join(spark_home, 'python')
--> 134     py4j = glob(os.path.join(spark_python, 'lib', 'py4j-*.zip'))[0]
    135     sys.path[:0] = [spark_python, py4j]
    136 

IndexError: list index out of range
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is most likely due to the SPARK_HOME environment variable not being set correctly on your system. Alternatively, you can just specify it when you're initialising findspark, like so:

import findspark
findspark.init('/path/to/spark/home')

After that, it should all work!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...