Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Control plane API availability and cluster repairs

Hello,

We've recently noticed that one of our Kubernetes clusters becomes inaccessible via the control plane API during deployments. The downtime typically lasts between 30 minutes and an hour. We've also observed that during these periods, the REPAIR_CLUSTER process starts. Here are the recent occurrences:

2025-02-07T00:19:44.322795855Z
2025-02-07T09:38:43.573013059Z
2025-02-07T11:26:43.549065231Z
2025-02-10T12:15:43.703596753Z
2025-02-10T12:56:43.46727104Z
2025-02-10T13:37:43.864985605Z
2025-02-10T14:20:43.599401413Z
2025-02-10T19:25:43.95928846Z
2025-02-11T09:11:45.334995747Z
2025-02-11T09:43:45.136944182Z
2025-02-14T15:14:11.761866548Z
2025-02-14T16:00:14.227655526Z

Could you please help us understand why REPAIR_CLUSTER is triggered so frequently?

Is there anything we can do to prevent periodic API control plane unavailability?

Thank you in advance!

0 2 122
2 REPLIES 2

 To better understand the issue you will need to check the logs 

Check Control Plane Logs & Metrics
View logs from the Kubernetes API Server:
gcloud logging read "resource.type=k8s_cluster AND logName:stderr" --limit 50


Check control plane CPU/memory usage:
gcloud container operations list --filter="operationType=UPGRADE_MASTER"


View recent cluster events:
kubectl get events --sort-by=.metadata.creationTimestamp -A
 Investigate API Overload During Deployments

If too many resources (pods, deployments, services) are updated at once, the API server might become overwhelmed.
kubectl get apiservices and look for timeouts or unavailable components.

To Prevent Future Issues:- 
Check Google Cloud Incident Logs:

gcloud container operations list --filter="operationType=REPAIR_CLUSTER"

Limit API Requests per Deployment: Reduce unnecessary API calls by caching responses.

Use Horizontal Pod Autoscaling (HPA): Instead of recreating pods, let the cluster autoscale based on demand.

Manually Scale the Control Plane if Needed: gcloud container clusters resize YOUR_CLUSTER_NAME --num-nodes=3

Top Labels in this Space
Top Solution Authors