Welcome To Ask or Share your Answers For Others

Write Spark data frame to multiple s3 buckets

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

Write Spark data frame to multiple s3 buckets

I want to save dataframe rows in the different S3 buckets. Let's assume all buckets exist. I have a simple dataframe:

tenantId	charge
tenant1	10
tenant2	20

question from:https://stackoverflow.com/questions/65850046/write-spark-data-frame-to-multiple-s3-buckets

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Welcome To Ask or Share your Answers For Others

1 Reply

replyed Oct 7, 2021 by 深蓝 (71.8m points)

Please use partitionBy clause as -

          df.write.
          partitionBy("tenantId").
          parquet("the root path")

It will create distinct folder by names "tenant1" and "tenant2" and put respective rows inside.

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

...