Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Cluster autoscaler won't scale nodes

We are experiencing an issue with the GKE cluster autoscaler. Since June 5th, it periodically stops working on the cluster in us-central1-c zone, and since June 11th, the issue has also appeared on the cluster in the us-central1-a in the same project.
Due to this problem, our workloads in GKE can remain in the Pending status for hours until the cluster autoscaler starts working again. The issue occurs with both N1 and N2 instance types. We are not exceeding our quotas and can launch a static number of nodes with the same labels and taints that we expect the cluster autoscaler to manage.
The last time when the issue occurred: June 11, 13:36 - June 11, 16:12 (Kyiv time).
During this period, we did not see any events from the cluster autoscaler in the Cloud Logging:
Version of GKE: 1.27.11-gke.1062004



 

6 2 270
2 REPLIES 2

I have experienced a similar issue on GKE 1.27. What finally resolved it for me was upgrading GKE to 1.28. Currently running 1.28.9-gke.1000000 on my clusters and everything runs as expected.

Do you see any messages in the pod/deployment event logs?

Top Labels in this Space