Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
385 views
in Technique[技术] by (71.8m points)

python - Add Jar to standalone pyspark

I'm launching a pyspark program:

$ export SPARK_HOME=
$ export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.9-src.zip
$ python

And the py code:

from pyspark import SparkContext, SparkConf

SparkConf().setAppName("Example").setMaster("local[2]")
sc = SparkContext(conf=conf)

How do I add jar dependencies such as the Databricks csv jar? Using the command line, I can add the package like this:

$ pyspark/spark-submit --packages com.databricks:spark-csv_2.10:1.3.0 

But I'm not using any of these. The program is part of a larger workflow that is not using spark-submit I should be able to run my ./foo.py program and it should just work.

  • I know you can set the spark properties for extraClassPath but you have to copy JAR files to each node?
  • Tried conf.set("spark.jars", "jar1,jar2") that didn't work too with a py4j CNF exception
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

2021-01-19 Updated

There are many approaches here (setting ENV vars, adding to $SPARK_HOME/conf/spark-defaults.conf, etc...) other answers already cover these. I wanted to add an answer for those specifically wanting to do this from within a Python Script or Jupyter Notebook.

When you create the Spark session you can add a .config() that pulls in the specific Jar file (in my case I wanted the Kafka package loaded):

spark = SparkSession.builder.appName('my_awesome')
    .config('spark.jars.packages', 'org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1')
    .getOrCreate()

Using this line of code I didn't need to do anything else (no ENVs or conf file changes).

  • Note 1: The JAR file will dynamically download, you don't need to manually download it.
  • Note 2: Make sure the versions match what you want, so in the example above my Spark version is 3.0.1 so I have :3.0.1 at the end.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...