We are experiencing an issue with the GKE cluster autoscaler. Since June 5th, it periodically stops working on the cluster in us-central1-c zone, and since June 11th, the issue has also appeared on the cluster in the us-central1-a in the same project.
Due to this problem, our workloads in GKE can remain in the Pending status for hours until the cluster autoscaler starts working again. The issue occurs with both N1 and N2 instance types. We are not exceeding our quotas and can launch a static number of nodes with the same labels and taints that we expect the cluster autoscaler to manage.
The last time when the issue occurred: June 11, 13:36 - June 11, 16:12 (Kyiv time).
During this period, we did not see any events from the cluster autoscaler in the Cloud Logging:
Version of GKE: 1.27.11-gke.1062004