Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

OperationalError Cloud Composer

Hello,

I upgraded my Cloud Composer environment from composer-3-airflow-2.10.2-build.5 to composer-3-airflow-2.10.2-build.11. Since, all the DAGs I have that generate about 40 tasks makes fail. Tasks goes from queued to running and even deferred some times, but inevitably fail. These tasks are basic 

CloudRunExecuteJobOperator calls. I just trigger jobs on GCP Cloud Run Jobs from airflow, it's such a basic task, I don't get why the system struggle with it. When I look at the logs here is what I get:
 
 
Show More
airflow-worker Retrying <unknown> in 0.48700580194895077 seconds as it raised OperationalError: (psycopg2.OperationalError) connection to server at "localhost" (::1), port 3306 failed: server closed the connection unexpectedly.

This probably means the server terminated abnormally before or while processing the request.

(Background on this error at: https://sqlalche.me/e/14/e3q8).
 
I already tried to increase the system size (to large) and increased the amount of RAM and CPU on the other components. I feel like the system cannot handle a lot f concurrent tasks to defer.
 
0 2 136
2 REPLIES 2

Hi @x-alex,

Thanks for your question. This forum focuses on Google Cloud's Application Integration. Could you please confirm if the issue you're describing occurs while using Application Integration? 😊

Hi @x-alex,

Welcome to Google Cloud Community!

The error you're seeing points to a database connection problem between your Airflow workers and the Airflow metadata database (usually Cloud SQL for Composer). This, along with the fact that it occurred after a Composer environment upgrade, strongly indicates that the issue is likely due to a change in the environment configuration or resource limitations.

Here are some potential solutions that might address your issue:

  • Rollback (Temporary): If possible and if the upgrade is the only change, consider rolling back to the previous Composer environment version (composer-3-airflow-2.10.2-build.5) as a temporary measure to confirm that the upgrade is indeed the root cause. This buys you time to investigate further.
  • Airflow Configuration Settings: You can configure these settings using Airflow configuration overrides within the Cloud Composer environment.
  • DAG Design: Review your DAGs to see if you can reduce the number of concurrent tasks or batch operations.
  • Scale Up Cloud SQL: If the Cloud SQL instance is experiencing high CPU, memory, or disk I/O usage, a reasonable solution would be to scale up to a larger instance with more resources.
  • Optimize Database Queries: If the Cloud SQL logs indicate slow queries, identify and optimize those queries. This might involve adding indexes to tables, rewriting queries, or using more efficient data structures.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

Top Labels in this Space