Solved: Re: Are you able to create and run a basic autopil...

young · 02-09-2024 05:04 AM

When I'm trying to create a basic autopilot cluster, it seems that the cluster is successfully created but none of pods are actually running and hang in pending.

If I describe a pod in kube-system, the message says there's no node.

node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  7m20s (x2 over 7m31s)  default-scheduler  0/1 nodes are available: 1 node(s) had untolerated taint {cloud.google.com/gke-quick-remove: true}. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
  Warning  FailedScheduling  119s (x5 over 8m29s)   default-scheduler  no nodes available to schedule pods

No idea really what's wrong with it. Is there anyone having same the same trouble?

garisingh

Autopilot clusters are created with 0 nodes (well technically nodes are created but then immediately deleted).
After you deploy your pods, it should trigger a scale-up event and add the appropriate nodes.

What version of GKE Autopilot are you using?

View solution in original post

mahmoudrabie

Hi Young

The error messages suggest that the pods can't be scheduled because the available nodes have taints that your pods do not tolerate. Specifically, the cloud.google.com/gke-quick-remove: true taint is mentioned, which is not a standard taint for GKE Autopilot nodes.

Another part of the error message indicates that no nodes are available to schedule pods. In GKE Autopilot, nodes are managed by Google Cloud, and you shouldn't need to manually intervene to ensure node availability.

Therefore, I would recommend the following:

(1) Ensure your cluster is correctly configured for Autopilot mode. There might have been an error or misconfiguration during the cluster creation process.

(2) Verify if the cluster is in Autopilot mode by checking its details in the Google Cloud Console or using the gcloud command line tool.

Kind regards

Mahmoud

mahmoudrabie

Hi @young

The issue you're facing with the Autopilot cluster, where pods remain in a pending state due to a lack of available nodes, can be attributed to a few possible causes. To troubleshoot this issue, let's follow these steps:

(1) Verify the Cluster Status

gcloud container clusters list --region YOUR_REGION

(2) Check Quota Limits

One common issue with cluster creation or pod scheduling can be GCP quota limits. Ensure you haven't hit any limits that might prevent the creation of new nodes.

gcloud compute regions describe YOUR_REGION

This command gives you information about your current usage and quotas in the region, including Compute Engine resources which Autopilot uses to manage nodes.

I hope that helps

Regards

Mahmooud

young

Hello @mahmoudrabie Thanks for the tip.

But I'm still trying to figure it out though, no clue.

When I check the cluster, I see STATUS value is running and NUM_NODES value is empty.

And for quotas I don't see any usage that's up limit.

Also I'm creating a cluster in an empty region, and when I create a standard mode cluster, it creates with nodes without problems.

It seems the machine type of autopilot nodes is e2.small, and I tried to make a compute engine of the same type to see the resource availability, but it seems okay, though there was one failed case for 10GB pd-balanced disk availability.

However if it was for the resource availability, I assume that the cluster creation would have failed and the information would have shown from the console notifications.

It's not easy to find any event information regarding this. 🤔

garisingh

Autopilot clusters are created with 0 nodes (well technically nodes are created but then immediately deleted).
After you deploy your pods, it should trigger a scale-up event and add the appropriate nodes.

What version of GKE Autopilot are you using?

young

Thank you!@garisingh

For a recent couple of days, I've been creating and deleting gke clusters with pretty much default configurations, and the version is 1.27.8-gke.1067004.

It's finally resolved my unknown curiosity for a recent couple of days.

Right after clusters are created, I was only watching system pods including in kube-system namespace, and none of them was running.

This wasn't something I expected and I didn't try deploying my own pods, though I tried to restart the kube-system deployments.

Did not know that gke autopilot clusters do not run initially.

Are you able to create and run a basic autopilot cluster?