csv - How to load jar dependenices in IPython Notebook

Question

Welcome To Ask or Share your Answers For Others

csv - How to load jar dependenices in IPython Notebook

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

csv - How to load jar dependenices in IPython Notebook

This page was inspiring me to try out spark-csv for reading .csv file in PySpark I found a couple of posts such as this describing how to use spark-csv

But I am not able to initialize the ipython instance by including either the .jar file or package extension in the start-up that could be done through spark-shell.

That is, instead of

ipython notebook --profile=pyspark

I tried out

ipython notebook --profile=pyspark --packages com.databricks:spark-csv_2.10:1.0.3

but it is not supported.

Please advise.

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-16T22:16:36+0000

You can simply pass it in the PYSPARK_SUBMIT_ARGS variable. For example:

export PACKAGES="com.databricks:spark-csv_2.11:1.3.0"
export PYSPARK_SUBMIT_ARGS="--packages ${PACKAGES} pyspark-shell"

These property can be also set dynamically in your code before SparkContext / SparkSession and corresponding JVM have been started:

packages = "com.databricks:spark-csv_2.11:1.3.0"

os.environ["PYSPARK_SUBMIT_ARGS"] = (
    "--packages {0} pyspark-shell".format(packages)
)

Categories

csv - How to load jar dependenices in IPython Notebook

csv - How to load jar dependenices in IPython Notebook

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags