Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
510 views
in Technique[技术] by (71.8m points)

amazon web services - AWS Glue Job generate unexpected extra columns on Redshift when writing to table

I have a Glue job that copies data from a table data catalog to a created table to Redshift as following:

  • The table in catalog is crawled from a S3 bucket of json. The schema given all fields in string type

  • I set the Glue pipeline to read from catalog then using ApplyMapping to change variable type then exports it to a Redshift table as following

        Transform0 = ApplyMapping.apply(frame = DataSource0, mappings = [("fieldA", "string", "fieldA", "string"),  ("fieldB", "string", "fieldB", "float"), ("updated_at", "string", "updated_at", "timestamp")], transformation_ctx = "Transform0")
        Datasink0 = glueContext.write_dynamic_frame.from_jdbc_conf(frame = Transform0,
                                                                    catalog_connection = "connect-to-Redshift", 
                                                                    connection_options = {"database" : "testdbredshift",
                                                                                            "dbtable" : "testdbredshift_tb"}, 
                                                                    redshift_tmp_dir = args["TempDir"],
                                                                    transformation_ctx = "DataSink0")```
    
    

In Redshift I created table as following:

```CREATE TABLE IF NOT EXISTS testdbredshift_tb(
fieldA varchar DISTKEY,
fieldB float,
updated_at timestamp SORT)
DISTSTYLE KEY```

But when I ran Gluejob, it usually created random extra column and the schema look like this and original created field is empty: fieldA, fieldB, fieldC, fieldB_float, fieldC_string

So how can I keep my schema when dumping those data to Redshift through Glue? Thank you so much!

question from:https://stackoverflow.com/questions/65839390/aws-glue-job-generate-unexpected-extra-columns-on-redshift-when-writing-to-table

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

56.9k users

...