The pods on my GKE pods restarts randomly Sometime it happens after 7 days sometimes it takes only 1 - 2 days time. Upon checking the logs it is showing that whole nodepool is drained and recreated and then pods are started on the new nodepool. So I want to prevent this because it is happening on my production environment although it is down for only 15 sec . But we cannot take any downtime so is there any way I can check the main reason why it is happening also how to prevent this in future. I have setup uptime check which is failing when I check the Pods all the pods are being recreated on the new nodepool and the creation time is same of all the pods.
My best guess at this point is that you likely have auto-upgrade enabled for your node pools (which is the default).
You can check to see if any upgrades have occurred recently by running `gcloud container operations list`.
Assuming this is the case, you'll likely want to either configure Maintenance Windows and/or Maintenance Exclusions so that upgrade only happen when you can tolerate downtime. With Maintenance exclusions, you could select "no minor or node upgrades" scope to prevent upgrades for 6 months at a time until the GKE version goes end of support.
On 2nd september my whole nodepool got created at 2:00pm ist then at 5:00pm again my whole nodepool got created why the upgrade happened two times ? I just want to disable every automatic upgrade i will upgrade it myself only. Can i use blue-green upgrade option instead of the surge option . will it prevent downtime?
It's odd that it was upgraded twice in such a short period. Any chance someone made a change to the nodepool configuration? There are a few properties that will result in a nodepool recreate if you change them.
In terms of disabling auto-upgrades, take a look at my post above. I'd suggest using a Maintenance Exclusion with the "no node upgrades" scope.
Blue / Green can help depending on the number of replicas you have, whether or not you have a PDB (Pod Disruption Budget) set, etc.
how can check the upgrade logs of GKE?
In Logs Explorer, you can run a query like
resource.type="gke_nodepool"
(log_id("cloudaudit.googleapis.com/activity") OR log_id("cloudaudit.googleapis.com/data_access"))
protoPayload.methodName:("UpdateNodePool" OR "UpdateClusterInternal")
resource.labels.cluster_name="YOUR_CLUSTER_NAME"
resource.labels.nodepool_name="YOUR_NODEPOOL_NAME"
for nodepool upgrades and a query like
resource.type="gke_cluster"
(log_id("cloudaudit.googleapis.com/activity") OR log_id("cloudaudit.googleapis.com/data_access"))
protoPayload.methodName:("UpdateCluster" OR "UpdateClusterInternal")
(protoPayload.metadata.operationType="UPGRADE_MASTER"
OR protoPayload.response.operationType="UPGRADE_MASTER")
resource.labels.cluster_name="YOUR_CLUSTER_NAME"
for cluster upgrades.