Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
560 views
in Technique[技术] by (71.8m points)

airflow - Getting the date of the most recent successful DAG execution

I am looking to create a transform in Airflow, and I want to ensure to get all data from my source since the last time a DAG has run in order to update my target table. In order to this, I want to be able to get the most recent execution which was successful.

I have found this: Apache airflow macro to get last dag run execution time which gets me somewhere to the end goal, however, this only gets the last time the DAG executed, regardless of it being successful or not.

SELECT col1, col2, col3
FROM schema.table
WHERE table.updated_at > '{{ last_dag_run_execution_date(dag) }}';

If an execution fails (due to connectivity or something like), the last_dag_run_execution_date(dag) will update, but we've missed the execution for that previous DAG run.

Ideally, this will pull the most recent non-failed execution. Or if anyone has any ideas how I can meet this, please let me know

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I've ended up changing the function in the referenced question to use the latest_execution_date, which is a predefined macro in Airflow, as such:

def get_last_dag_run(dag):
    last_dag_run = dag.latest_execution_date
    if last_dag_run is None: 
        return '2013-01-01'
    else:
        return last_dag_run

Seems to be working for me at the moment.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...