python - How to save a file on the cluster

Question

Welcome To Ask or Share your Answers For Others

python - How to save a file on the cluster

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How to save a file on the cluster

I'm connected to the cluster using ssh and I send the program to the cluster using

spark-submit --master yarn myProgram.py

I want to save the result in a text file and I tried using the following lines:

counts.write.json("hdfs://home/myDir/text_file.txt")
counts.write.csv("hdfs://home/myDir/text_file.csv")

However, none of them work. The program finishes and I cannot find the text file in myDir. Do you have any idea how can I do this?

Also, is there a way to write directly to my local machine?

EDIT: I found out that home directory doesn't exist so now I save the result as: counts.write.json("hdfs:///user/username/text_file.txt") But this creates a directory named text_file.txt and inside I have a lot of files with partial results inside. But I want one file with the final result inside. Any ideas how I can do this ?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T21:38:02+0000

Spark will save the results in multiple files since the computation is distributed. Therefore writing:

counts.write.csv("hdfs://home/myDir/text_file.csv")

means to save the data on each partition as a separate file in the folder text_file.csv. If you want the data saved as a single file, use coalesce(1) first:

counts.coalesce(1).write.csv("hdfs://home/myDir/text_file.csv")

This will put all the data into a single partition and the number of saved files will thus be 1. However, this could be a bad idea if you have a lot of data. If the data is very small then using collect() is an alternative. This will put all data onto the driver machine as an array, which can then be saved as a single file.

Categories

python - How to save a file on the cluster

python - How to save a file on the cluster

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags