Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Composer (Airflow) timezones: Change Airflow core timezone does not take effect

Hello,

I am using GCP Composer(airflow). What I want to do is to change Airflow, in all its components, to reflect my Time Zone: "Europe/Lisbon". I know that, by default, Composer uses timedates in UTC timezone, so I alredy proceed on some steps to change that, but, without being able to change in all components.

What I already did was:

1 Change Composer properties - Airflow Configuration Overwrites - with values:

webserver - default_ui_timezone: "Europe/Lisbon"
core - default_timezone: "Europe/Lisbon"
2 create DAGs timezone aware :

I am using pendulum library, and specifying timezone. The scheduled is working accordley with my timezone.

What is working as expected?:

The Webserver UI is presented in my timezone - WebUI: OK
The DAG being executed accordly with the cron on my timezone - Scheduling: OK

What is the issue?

It seams that internally Composer is not using my timezone as Default. As example, looking for Task Log, the AIRFLOW_CTX_EXECUTION_DATE is still in UTC:

(...)
AIRFLOW_CTX_EXECUTION_DATE=2023-04-14T11:58:00+00:00
(...)
[2023-04-14, 12:59:06 WEST] {taskinstance.py:1416} INFO - Marking task as SUCCESS. dag_id=timezone_aware_dag3, task_id=task_one, execution_date=20230414T115800, start_date=20230414T115904, end_date=20230414T115906

So, I have my logs messages in WEST (12:59:06 WEST), but internal date meatadate still in UTC (execution_date=20230414T115800)

Another issue is looking for the Schedule time, vs executed time, where Airflow shows Logs, accordly with UTC, but the scheduler, accordly with my local time. See the image in this link: https://drive.google.com/file/d/1vT5C_6Q2xNLTzcRV1kgxijRigiXLypWM/view?usp=share_link

Expected behaviour: Once I changed airflow core timezone, what I was expecting was that all times were handled in my timezone.

Complete Log of Task Execution:

[2023-04-14, 12:59:03 WEST] {taskinstance.py:1180} INFO - Dependencies all met for <TaskInstance: timezone_aware_dag3.task_one scheduled__2023-04-14T11:58:00+00:00 [queued]>
[2023-04-14, 12:59:04 WEST] {taskinstance.py:1180} INFO - Dependencies all met for <TaskInstance: timezone_aware_dag3.task_one scheduled__2023-04-14T11:58:00+00:00 [queued]>
[2023-04-14, 12:59:04 WEST] {taskinstance.py:1377} INFO - 
--------------------------------------------------------------------------------
[2023-04-14, 12:59:04 WEST] {taskinstance.py:1378} INFO - Starting attempt 1 of 3
[2023-04-14, 12:59:04 WEST] {taskinstance.py:1379} INFO - 
--------------------------------------------------------------------------------
[2023-04-14, 12:59:04 WEST] {taskinstance.py:1398} INFO - Executing <Task(PythonOperator): task_one> on 2023-04-14 11:58:00+00:00
[2023-04-14, 12:59:04 WEST] {standard_task_runner.py:52} INFO - Started process 6068 to run task
[2023-04-14, 12:59:04 WEST] {standard_task_runner.py:79} INFO - Running: ['airflow', 'tasks', 'run', 'timezone_aware_dag3', 'task_one', 'scheduled__2023-04-14T11:58:00+00:00', '--job-id', '340', '--raw', '--subdir', 'DAGS_FOLDER/5-dag_timezone_aware3.py', '--cfg-path', '/tmp/tmpy4s2bwl1', '--error-file', '/tmp/tmpessqsl6w']
[2023-04-14, 12:59:04 WEST] {standard_task_runner.py:80} INFO - Job 340: Subtask task_one
[2023-04-14, 12:59:05 WEST] {task_command.py:375} INFO - Running <TaskInstance: timezone_aware_dag3.task_one scheduled__2023-04-14T11:58:00+00:00 [running]> on host airflow-worker-ch98z
[2023-04-14, 12:59:06 WEST] {taskinstance.py:1591} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=vsilva
AIRFLOW_CTX_DAG_ID=timezone_aware_dag3
AIRFLOW_CTX_TASK_ID=task_one
AIRFLOW_CTX_EXECUTION_DATE=2023-04-14T11:58:00+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-04-14T11:58:00+00:00
[2023-04-14, 12:59:06 WEST] {logging_mixin.py:115} INFO - Function One
[2023-04-14, 12:59:06 WEST] {logging_mixin.py:115} INFO - DAG Timezone:
[2023-04-14, 12:59:06 WEST] {logging_mixin.py:115} INFO - Timezone('Europe/Lisbon')
[2023-04-14, 12:59:06 WEST] {python.py:173} INFO - Done. Returned value was: None
[2023-04-14, 12:59:06 WEST] {taskinstance.py:1416} INFO - Marking task as SUCCESS. dag_id=timezone_aware_dag3, task_id=task_one, execution_date=20230414T115800, start_date=20230414T115904, end_date=20230414T115906
[2023-04-14, 12:59:06 WEST] {local_task_job.py:156} INFO - Task exited with return code 0
[2023-04-14, 12:59:07 WEST] {local_task_job.py:273} INFO - 1 downstream tasks scheduled from follow-on schedule check

Thank you

 

 

 

 

 

0 3 2,793
3 REPLIES 3

Hi @vmasilva,

Welcome back to Google Cloud Community.

It appears that even after changing the timezone in the Composer properties and making your DAG timezone aware, your Airflow logs are still displaying UTC time.

The Airflow scheduler and workers, independent processes running on different computers than the Composer environment, produce the Airflow logs may be one factor contributing to this problem. To change the timezone for certain components, altering the Composer settings might not be sufficient.

You may also try adjusting the Airflow scheduler and workers' timezone environment variable to "Europe/Lisbon" to address this problem. This can be done by including the subsequent configuration in the Airflow UI:

  1. Go to the Airflow UI and click on the "Admin" tab.
  2. Click on "Connections" and then click on "Create".
  3. Enter "timezone" as the connection ID and "Europe/Lisbon" as the value for the "Extra" field.
  4. Click on "Save".
  5. Go to the "Variables" tab and click on "Create".
  6. Enter "AIRFLOW__CORE__TIMEZONE" as the key and "Europe/Lisbon" as the value.
  7. Click on "Save".

The "timezone" connection and "AIRFLOW__CORE__TIMEZONE" variable are set to "Europe/Lisbon" in this arrangement, which the Airflow scheduler and employees should recognize.

You can try running your DAG once more after adding this configuration to see if the logs reflect the proper timezone.

Here are some documentation that might help you.
https://cloud.google.com/composer/docs/run-apache-airflow-dag?_ga=2.184955241.-1392753435.1676655686

https://cloud.google.com/composer/docs/how-to/accessing/airflow-web-interface?_ga=2.75848405.-139275...

https://cloud.google.com/composer/docs/how-to/using/writing-dags?_ga=2.75848405.-1392753435.16766556...

Hi Aris,

Thank you for your reply. 

Once I understood, I have to set on Airflow, a new Variable, and new Connection. The new Variable it's fine, but I didn't understood exaclty what values should I set in new connection: 

Connection Id : "timezone"
Connection Type - What should I put in this field? By default Connection Type is "Email". Shouldn't I change that? 
The remain fields: Description, Host, Schema, Login, Password.... Should I leave empty?

Thank you 

 

Hi @vmasilva,

You must include the following information when creating new timezone connection in Airflow:

  • Timezone (this should correspond to the connection ID used in your DAG).
  • The kind of timezone you wish to connect to will determine the connection type.
  • You might choose "Postgres" or "MySQL" as the connection type if you're connecting to database that contains timezone information, for instance.
  • You may need to choose "HTTP" or "Google Cloud Platform" as the connection type if you're using third-party API to get timezone data
  • (Optional) A precise description of the connection
  • Host: Depending on the type of connection, the hostname or IP address of the server you're connected to is optional.
  • Depending on the type of connection, the schema or database name may be included.
  • The username used to log in to the server is optional and dependent on the type of connection.
  • Password: Depending on the type of connection, this password is used to authenticate to the server.

    Depending on the kind of timezone you're connecting to and how you're accessing it, you should fill out the values for the fields. For instance, you would need to provide the API endpoint URL in the Host box and any necessary authentication credentials in the Login and Password fields if you were connecting to a third-party API to collect timezone information. You would need to specify the database hostname, schema name, and login credentials if you were connecting to a database.

    Hope this helps!