Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Create operation on Cloud Composer 2 failed

Hello,

I'm trying to create Composer2 environment but getting this error, see printscreen, I didn't find any answer, please help.

lucie_rabochova_0-1717500990941.png

CREATE operation on this environment failed 6 hours ago with the following error message:
Some of the GKE pods failed to become healthy. Please check the GKE logs for details, and retry the operation.

 

Solved Solved
0 27 4,541
1 ACCEPTED SOLUTION

If you don't have any clusters created yet, it indicates that the Composer environment creation process failed before the cluster could be properly set up. This can happen due to several reasons, including misconfigurations, insufficient permissions, or quota issues.

View solution in original post

27 REPLIES 27

To resolve the issue with creating a Composer 2 environment where some of the GKE pods failed to become healthy, follow these comprehensive troubleshooting steps:

1. Examine GKE Logs:

  • Navigate to the Google Kubernetes Engine (GKE) section in the Google Cloud Console.
  • Locate the cluster associated with your Composer environment and click to open its details.
  • Filter logs by pod name: Focus on the logs of the specific pods that failed to start for targeted troubleshooting.
  • Look for "CrashLoopBackOff": Identify pods repeatedly crashing and restarting, then check their logs to understand why.
  • Examine the "Events" tab: Review events related to pod creation and failures for additional clues.
  • Common GKE Log Errors: Watch for permission issues ("forbidden," "permission denied"), image pull failures ("ImagePullBackOff"), and application-specific errors.

2. Assess Pod Health and Resources:

  • Go to the "Workloads" section in your GKE cluster.
  • Check for pods in Error, CrashLoopBackOff, or Pending states.
  • Click on problematic pods for detailed error messages.
  • Ensure your GKE cluster has sufficient resources (CPU, memory, etc.) to support Composer.
  • Verify no resource quotas or limits are hindering pod startups.

3. Verify Network Configuration:

  • Review your GKE cluster's network settings.
  • Ensure no network policies or firewall rules are blocking pod communication.

4. Retry Environment Creation:

  • Transient issues can sometimes resolve themselves.
  • Delete the partially created environment before retrying to avoid conflicts.

5. Check Versions and Logs:

  • Composer and GKE: Ensure you're using compatible versions (refer to the compatibility matrix).
  • Cloud Build Logs: If using custom images, check for errors during the image building process.
  • Airflow Webserver Logs: Look for errors related to worker pod communication or scheduler issues.
  • Composer Environment Logs: Review logs within the failing environment for specific errors.

6. Additional Checks:

  • Permissions and IAM Roles: Ensure the Composer service account has the required IAM roles and permissions for resource management.
  • Composer Environment Variables: Double-check the accuracy of any custom environment variable values.

If you still encounter issues after following these steps, contact Google Cloud Support for further assistance. Provide them with relevant logs and error messages to help them diagnose the problem effectively.

I don't have any clusters created yet. If I go to Kubernetes Engine in GCP, I can't see anything. 

lucie_rabochova_0-1717586471196.png

Can I contact Google Cloud Support for free? Or is there any pricing behind it?

If you don't have any clusters created yet, it indicates that the Composer environment creation process failed before the cluster could be properly set up. This can happen due to several reasons, including misconfigurations, insufficient permissions, or quota issues.

Even i am struggling with the same issue @lucie_rabochova @ms4446 . Similarly i couldn't find any logs or clusters created yet . I provided all the required permissions to my custom composer account like editor and also the roles/composer.ServiceAgentV2Ext permissions . But Still I encounter the same . Can someone help me through this issue .

Please, if you could, file an issue here: https://issuetracker.google.com/   and reference this thread as well since GCP Customer Engineers review and moderate these forums.  We need as much visibility as we can get because I think there is something intricately at hand with the service "composer-agent" that seems to go into a loop of failure when mounting gcsFuse (pardon if your logs show different - but if its taking 45+ minutes to arrive at this failure, with the same message where "some pods failed to become healthy", im betting it is the same. 

Also, could you name the region you're having this? I am having this in us-central1 , some folks are having issues in europe, and its just good to get info to cultivate if this is a global or region specific problem. 

Hoping this gets better soon, my team will need to add a dependency or create/destroy one of their dev instances at some point and I'm going to be unable to assist them. Thanks for commenting!

Hi,

I'm currently having the same problem despite trying many times. Thanks to this thread, I explore more on GKE during environment creation and found some clues about both Unschedulable and Crashloopbackoff from this screenshot. I think the main issue must be from the Worker part that is Unscheduled.

Cloud Composer used to work perfectly fine before, but not cannot create a new "small" environment. 

Anyone has some clue on how to fix this, or how to reach out to Google Support team to investigate this issue?
Thank you

After I searched more on this issue. It logged here that Unschedulable state is due to cannot scale up the node. Though, I still fail to create a cluster. Any help is welcome. Please kindly fix that. The Cloud Composer is useless if it couldn’t be created.

hey @fonylew  did you find any resolution. i am also faving the same issue

Not at the moment, I found that this issue has been logged here in Issue Tracker. It has been assigned to Cloud Composer team. https://issuetracker.google.com/issues/346579985 

Hi, Good to here I'm not alone, I would like to know if there is FREE Google Support. Support who can fix the error.

I too am having this issue, and someone else on stack overflow is as well. I recommend everyone make support cases so these things can be made more visible by GCP Support. Thanks other folks for well-intended advice - I've been working with this service rather successfully for 2-3 years, and this is impeding my ability to make sure my team can create their own dev clusters using terraform IAC i've made that has not been changed since its last good state of operating around 05/13

I had a similar issue, tried to create the "small" Cloud Composer environment in different regions. Finally in europe-west1 it worked, in other European regions like europe-west4 it didn't.

Right, and forgive me, im understandable towards your time but since it might matter to someone who has made it such that they need to use that zone, I'd argue its worth making a issue tracker report so that this can get better. https://issuetracker.google.com/  

I've created now an issue on issuetracker.google.com for it, as it failed for me again various times, when I tried to create another environment.

https://issuetracker.google.com/issues/346579985

@lucie_rabochova . Were you able to solve the issue . can you please reply back with the solution that helped you through.

Hi all,

I've tried everything, but no luck. I went step by step, checked everything, tested it, I don't understand it at all, it's quite frustrating. I finally found a solution, which is not ideal, but I can't wait anymore.

I created a new project in GCP just for Cloud Composer and the environment was created without any problem. 

I'm definitely not giving up, and let you know if I find any solution.

 

Thank you! I think you see the value of keeping an issue open because if you come to rely on the option of "make a new gcp project and start over" , eventually there's going to be a problem but it sounds like hyou are in the exploratory stages of Cloud composer either now or with the org you are currently working with. Even "turning the api off and on" really isnt acceptable as it destroys existing resources in the process. All this to say, thank you for reporting back.  I hope your future works in the new GCP project stay stable.

Even after resolution with this (which has not happened for me either, testing again now), this is going to affect my faith in this orchestrated service enough to make different architectural decisions in ways that really ought not be the case for using the main hyperscaler companies, but to a degree my team is already entrenched with this resource.

i tried 3 times in 3 different projects but nothing seem to work. tried to give all permission still getting same gke pods are unhealthy.
is there any discord channel for gcp please let me know 

 

Yes , true . I tried by giving all the roles that are specified and also in different location. But still I receive the same error -"Some of the GKE pods failed to become healthy"

Hey all, please vote +1 to communicate that you are impacted in this issue created by @michael_gohttps://issuetracker.google.com/issues/346579985 

Do update this thread with any solution if you find it!

any updates or progress  

I'm also facing this issue last two days, didn't find any resolution yet. pls help me out.

Som31_0-1718506040570.png

 

I'm facing the same issue for a week now, anyone managed to fix it  ? 

Best I can offer is to add your +1 to this issuetracker ticket if you havent already and chime in with your experience ( region, env size etc) 
Unable to create Composer 2 environment [346579985] - Issue Tracker (google.com) 
I just periodically test my terraform setup  and see if anything changes, but personally, nope, nothing changed yet , same error.

Hello,

First, sorry for my English; I am still learning.

Second, I have been experiencing this error for two weeks, but today I tried in europe-north1 and it ran successfully.

I just tried Composer v3 today and be able to create a small environment.

yes v3 is getting created but the in v2 i am stoll facing issue i have added the permissions and increased the quota to 4 tb but still not able to create the composer.