Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

All pods in my Cluster starts randomly

The pods on my GKE pods restarts randomly Sometime it happens after 7 days sometimes it takes only 1 - 2 days time. Upon checking the logs it is showing that whole nodepool is drained and recreated and then pods are started on the new nodepool. So I want to prevent this because it is happening on my production environment although it is down for only 15 sec . But we cannot take any downtime so is there any way I can check the main reason why it is happening also how to prevent this in future. I have setup uptime check which is failing when I check the Pods all the pods are being recreated on the new nodepool and the creation time is same of all the pods.

0 5 303
5 REPLIES 5

My best guess at this point is that you likely have auto-upgrade enabled for your node pools (which is the default).
You can check to see if any upgrades have occurred recently by running `gcloud container operations list`.

Assuming this is the case, you'll likely want to either configure Maintenance Windows and/or Maintenance Exclusions so that upgrade only happen when you can tolerate downtime.   With Maintenance exclusions, you could select "no minor or node upgrades" scope to prevent upgrades for 6 months at a time until the GKE version goes end of support.

On 2nd september my whole nodepool got created at 2:00pm ist then at 5:00pm again my whole nodepool got created why the upgrade happened two times ? I just want to disable every automatic upgrade i will upgrade it myself only. Can i use blue-green upgrade option instead of the surge option . will it prevent downtime?

It's odd that it was upgraded twice in such a short period.  Any chance someone made a change to the nodepool configuration?  There are a few properties that will result in a nodepool recreate if you change them.

In terms of disabling auto-upgrades, take a look at my post above.  I'd suggest using a Maintenance Exclusion with the "no node upgrades" scope.

Blue / Green can help depending on the number of replicas you have, whether or not you have a PDB (Pod Disruption Budget) set, etc.

how can check the upgrade logs of GKE?

In Logs Explorer, you can run a query like 

resource.type="gke_nodepool"
(log_id("cloudaudit.googleapis.com/activity") OR log_id("cloudaudit.googleapis.com/data_access"))
protoPayload.methodName:("UpdateNodePool" OR "UpdateClusterInternal")
resource.labels.cluster_name="YOUR_CLUSTER_NAME"
resource.labels.nodepool_name="YOUR_NODEPOOL_NAME"

for nodepool upgrades and a query like 

resource.type="gke_cluster"
(log_id("cloudaudit.googleapis.com/activity") OR log_id("cloudaudit.googleapis.com/data_access"))
protoPayload.methodName:("UpdateCluster" OR "UpdateClusterInternal")
(protoPayload.metadata.operationType="UPGRADE_MASTER"
  OR protoPayload.response.operationType="UPGRADE_MASTER")
resource.labels.cluster_name="YOUR_CLUSTER_NAME"

 for cluster upgrades.

Top Labels in this Space