amazon web services - AWS Glue output file name

Question

Welcome To Ask or Share your Answers For Others

amazon web services - AWS Glue output file name

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:49:59+0000

Due to the nature of how Spark works, it's not possible to name the file. However, it's possible to rename the file right afterward.

URI = sc._gateway.jvm.java.net.URI
Path = sc._gateway.jvm.org.apache.hadoop.fs.Path
FileSystem = sc._gateway.jvm.org.apache.hadoop.fs.FileSystem
fs = FileSystem.get(URI("s3://{bucket_name}"), sc._jsc.hadoopConfiguration())

file_path = "s3://{bucket_name}/processed/source={source_name}/year={partition_year}/week={partition_week}/"
df.coalesce(1).write.format("json").mode(
    "overwrite").option("codec", "gzip").save(file_path)

# rename created file
created_file_path = fs.globStatus(Path(file_path + "part*.gz"))[0].getPath()
fs.rename(
    created_file_path,
    Path(file_path + "{desired_name}.jl.gz"))

Categories

amazon web services - AWS Glue output file name

amazon web services - AWS Glue output file name

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags