Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
275 views
in Technique[技术] by (71.8m points)

Problem with start date and scheduled date in Apache airflow

I am working with Apache airflow and I have a problem with the scheduled day and the starting day.

I want a dag to run every day at 8:00 AM UTC. So, what I did is:

default_args = {
        'owner': 'airflow',
        'depends_on_past': False,
        'start_date': datetime(2020, 12, 7, 10, 0,0),
        'email': ['[email protected]'],
        'email_on_failure': True,
        'email_on_retry': False,
        'retries': 1,
        'retry_delay': timedelta(hours=5)
    }
#never run
dag = DAG(dag_id='id', default_args=default_args, schedule_interval='0 8 * * *',catchup=True)

The day I upload the dag was 2020-12-07 and I wanted to run it on 2020-12-08 at 08:00:00

I set the start_date at 2020-12-07 at 10:00:00 to avoid running it at 2020-12-07 at 08:00:00 and only trigger it the next day, but it didn't work.

What I did then is modify the starting day:

default_args = {
        'owner': 'airflow',
        'depends_on_past': False,
        'start_date': datetime(2020, 12, 7, 7, 59,0),
        'email': ['[email protected]'],
        'email_on_failure': True,
        'email_on_retry': False,
        'retries': 1,
        'retry_delay': timedelta(hours=5)
    }
#never run
dag = DAG(dag_id='etl-ca-cpke-spark_dev_databricks', default_args=default_args, schedule_interval='0 8 * * *',catchup=True)

Now the start date is 1 minute before the dag should run, and indeed, because the catchup is set to True, the dag has been triggered for 2020-12-07 at 08:00:00, but it has not being triggered for 2020-12-08 at 08:00:00.

Why?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Airflow schedule tasks at the END of the interval (See doc reference)

Meaning that when you do:

start_date: datetime(2020, 12, 7, 8, 0,0)
schedule_interval: '0 8 * * *'

The first run will kick in at 2020-12-08 at 08:00+- (depends on resources)

this run execution_date will be: 2020-12-07 08:00

The next run will kick in at 2020-12-09 at 08:00

this run execution_date of 2020-12-08 08:00.

Since today is 2020-12-08 the next run didn't kick in because it's not the END of the interval yet.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...