I'm am trying to use Spark with Python. I installed the Spark 1.0.2 for Hadoop 2 binary distribution from the downloads page. I can run through the quickstart examples in Python interactive mode, but now I'd like to write a standalone Python script that uses Spark. The quick start documentation says to just import pyspark
, but this doesn't work because it's not on my PYTHONPATH.
I can run bin/pyspark
and see that the module is installed beneath SPARK_DIR/python/pyspark
. I can manually add this to my PYTHONPATH environment variable, but I'd like to know the preferred automated method.
What is the best way to add pyspark
support for standalone scripts? I don't see a setup.py
anywhere under the Spark install directory. How would I create a pip package for a Python script that depended on Spark?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…