amazon web services - How to look for updated rows when using AWS Glue?

Question

Welcome To Ask or Share your Answers For Others

amazon web services - How to look for updated rows when using AWS Glue?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

amazon web services - How to look for updated rows when using AWS Glue?

I'm trying to use Glue for ETL on data I'm moving from RDS to Redshift.

As far as I am aware, Glue bookmarks only look for new rows using the specified primary key and does not track updated rows.

However that data I am working with tends to have rows updated frequently and I am looking for a possible solution. I'm a bit new to pyspark, so if it is possible to do this in pyspark I'd highly appreciate some guidance or a point in the right direction. If there's a possible solution outside of Spark, I'd love to hear it as well.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:56:00+0000

You can use the query to find the updated records by filtering data at source JDBC database as shown below example. I have passed date as an argument so that for each run I can fetch only latest values from mysql database in this example.

query= "(select ab.id,ab.name,ab.date1,bb.tStartDate from test.test12 ab join test.test34 bb on ab.id=bb.id where ab.date1>'" + args['start_date'] + "') as testresult"

datasource0 = spark.read.format("jdbc").option("url", "jdbc:mysql://host.test.us-east-2.rds.amazonaws.com:3306/test").option("driver", "com.mysql.jdbc.Driver").option("dbtable", query).option("user", "test").option("password", "Password1234").load()

Categories

amazon web services - How to look for updated rows when using AWS Glue?

amazon web services - How to look for updated rows when using AWS Glue?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags