Is there a way to prevent PySpark from creating several small files when writing a DataFrame to JSON file?
If I run:
df.write.format('json').save('myfile.json')
or
df1.write.json('myfile.json')
it creates the folder named myfile
and within it I find several small files named part-***
, the HDFS way. Is it by any means possible to have it spit out a single file instead?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…