I am working with Apache airflow and I have a problem with the scheduled day and the starting day.
I want a dag to run every day at 8:00 AM UTC. So, what I did is:
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2020, 12, 7, 10, 0,0),
'email': ['[email protected]'],
'email_on_failure': True,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(hours=5)
}
#never run
dag = DAG(dag_id='id', default_args=default_args, schedule_interval='0 8 * * *',catchup=True)
The day I upload the dag was 2020-12-07 and I wanted to run it on 2020-12-08 at 08:00:00
I set the start_date at 2020-12-07 at 10:00:00 to avoid running it at 2020-12-07 at 08:00:00 and only trigger it the next day, but it didn't work.
What I did then is modify the starting day:
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2020, 12, 7, 7, 59,0),
'email': ['[email protected]'],
'email_on_failure': True,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(hours=5)
}
#never run
dag = DAG(dag_id='etl-ca-cpke-spark_dev_databricks', default_args=default_args, schedule_interval='0 8 * * *',catchup=True)
Now the start date is 1 minute before the dag should run, and indeed, because the catchup is set to True, the dag has been triggered for 2020-12-07 at 08:00:00, but it has not being triggered for 2020-12-08 at 08:00:00.
Why?
See Question&Answers more detail:
os