Cloud Composer 2.6.2/3 seems to have a known issue with instance creation even if my service account has the proper roles with it. So the error message below is misleading:
Error:
CREATE operation on this environment failed...
Some of the GKE pods failed to become healthy. Please check the GKE logs for details, and retry the operation.
The issue may be caused by missing IAM roles in the following service accounts..
The issue doesn't seem solvable and it's an issue that random people are having. The issue first popped up on Issue Tracker in late December. I have to troubleshoot and continue trying to solve the issue on my own. The problem seems to be a bug that random people are facing. Random users can't create Cloud Composer 2 environments. I happen to be one of those users.
Has anyone come across on this same issue?
Solved! Go to Solution.
I was getting this same error after many attempts and solved it - your solution might look different.
Composer calls the kubernetes service - search for a kubernetes service set up specifically for composer. For some reason that kubernetes service wanted a huge amount of persistant disk storage > 500GB - I increased my quota to 2TB for when it set up and then was capable of initializing the service.
The error message was very misleading and pointed me in completely the wrong direction.
I got mine working, not sure this will work for anyone else....
1. Double-double check the service account creds: https://cloud.google.com/composer/docs/composer-2/create-environments#grant-permissions
2. Make sure the number of workers & their configs don't exceed those of the environment size:
- Small : Autoscaling between 1 and 3 workers, with 0.5 vCPU, 2 GB memory, 1 GB storage each
- Medium : Autoscaling between 2 and 6 workers, with 2 vCPU, 7.5 GB memory, 5 GB storage each
- Large : Autoscaling between 3 and 12 workers, with 4 vCPU, 15 GB memory, 10 GB storage each
I had a small environment but with 4 workers.
3. Check your GKE logs during env setup: are you exceeding your quotas? I found that the GCE was requiring more than the default 500GB SSD capacity, causing the GKE to fail the health checks, and erroneously telling me that it because of IAM permissions. I requested the quota to be increased, which was approved within a minute.
I'm not sure which of the 3 was the thing that fixed it, or whether it was a combo of all 3. But it works now. Hope that helps.
I'm also having the same problem. Tried creating three instances, all three of them failed because of that.
Hi did you find any solution to this ?
I was getting this same error after many attempts and solved it - your solution might look different.
Composer calls the kubernetes service - search for a kubernetes service set up specifically for composer. For some reason that kubernetes service wanted a huge amount of persistant disk storage > 500GB - I increased my quota to 2TB for when it set up and then was capable of initializing the service.
The error message was very misleading and pointed me in completely the wrong direction.
I got mine working, not sure this will work for anyone else....
1. Double-double check the service account creds: https://cloud.google.com/composer/docs/composer-2/create-environments#grant-permissions
2. Make sure the number of workers & their configs don't exceed those of the environment size:
- Small : Autoscaling between 1 and 3 workers, with 0.5 vCPU, 2 GB memory, 1 GB storage each
- Medium : Autoscaling between 2 and 6 workers, with 2 vCPU, 7.5 GB memory, 5 GB storage each
- Large : Autoscaling between 3 and 12 workers, with 4 vCPU, 15 GB memory, 10 GB storage each
I had a small environment but with 4 workers.
3. Check your GKE logs during env setup: are you exceeding your quotas? I found that the GCE was requiring more than the default 500GB SSD capacity, causing the GKE to fail the health checks, and erroneously telling me that it because of IAM permissions. I requested the quota to be increased, which was approved within a minute.
I'm not sure which of the 3 was the thing that fixed it, or whether it was a combo of all 3. But it works now. Hope that helps.
Thank you! It worked for me as well.
Solved by increasing quota
Can you provide a guide to increase the quota?