I'm launching a pyspark program:
$ export SPARK_HOME=
$ export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.9-src.zip
$ python
And the py code:
from pyspark import SparkContext, SparkConf
SparkConf().setAppName("Example").setMaster("local[2]")
sc = SparkContext(conf=conf)
How do I add jar dependencies such as the Databricks csv jar? Using the command line, I can add the package like this:
$ pyspark/spark-submit --packages com.databricks:spark-csv_2.10:1.3.0
But I'm not using any of these. The program is part of a larger workflow that is not using spark-submit I should be able to run my ./foo.py program and it should just work.
- I know you can set the spark properties for extraClassPath but you have to copy JAR files to each node?
- Tried conf.set("spark.jars", "jar1,jar2") that didn't work too with a py4j CNF exception
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…