I'm connected to the cluster using ssh
and I send the program to the cluster using
spark-submit --master yarn myProgram.py
I want to save the result in a text file and I tried using the following lines:
counts.write.json("hdfs://home/myDir/text_file.txt")
counts.write.csv("hdfs://home/myDir/text_file.csv")
However, none of them work. The program finishes and I cannot find the text file in myDir
. Do you have any idea how can I do this?
Also, is there a way to write directly to my local machine?
EDIT: I found out that home
directory doesn't exist so now I save the result as:
counts.write.json("hdfs:///user/username/text_file.txt")
But this creates a directory named text_file.txt
and inside I have a lot of files with partial results inside. But I want one file with the final result inside. Any ideas how I can do this ?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…