Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

How to really stop a DAG from running on startup

Hello,

We use Airflow on a daily basis but everything runs in a few hours during the morning. After everything ran, we use a pipeline in Gitlab to first take a snapshot of the Composer environment and second, shut it down. The next morning another pipeline starts up the environment and loads the latest snapshot so everything is the same.

The only problem we have is that DAGs run once even with catchup on False and even if they are disabled. For instance, I was testing a DAG every two hours (when Airflow was up of course) and I disabled it a couple days ago. It's still running once a day as soon as Airflow starts.

Same with some DAGs I was testing that should every day (again, catchup in False) but Airflow was running only on weekdays (firsts tests we were doing). On monday those DAGs ran as soon as Airflow started (showing as if they ran the day before - sunday) and then again their normal run at certain time.

Timezone settings are fine and I'm only at UTC-3 but running things mid morning so there's no chance of having trouble there.

Any clues?

Thanks in advance!

0 1 666
1 REPLY 1

Hi @caracena,

Welcome to Google Cloud Community!

It seems you’re experiencing a common issue on Airflow DAG scheduling, specifically with DAGs running once on startup despite catchup=False and being disabled.

Here are some possible reasons and suggestions that may help resolve the issue:

  • When catchup=False, the scheduler will create a run for the latest interval on startup based on the start date and scheduled interval that has not yet been executed. Ensure your start_date and scheduling interval are aligned with your intended scheduling logic. You can refer to this documentation for example scenarios and complete guidance.
  • Monitor your logs related to DAG's scheduling attempts, this can provide more insight into the issue.
  • When you disable your DAG, ensure it was also reflected on your database.
  • Consider applying a max_active_runs=1 as a workaround to limit the number of active runs.
  • Another work around is to manually adjust your start_date to a future date, but be cautious, as this requires regular updates to avoid any unwanted DAG startups. You can check this discussion as additional reference.

If the issue persists, I recommend reaching out to Google Cloud Support for further assistance, as they can provide insights into whether this behavior is specific to your project.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.