Hi,
My GKE Autopilot cluster was created in version `1.27.3-gke.100` and have been updated to ` 1.30.2-gke.1587003` which is supposed to have pod bursting re-enabled according https://cloud.google.com/kubernetes-engine/docs/how-to/pod-bursting-gke#availability-in-gke.
BTW, all worker nodes are in v1.30.2-gke.1587003 version too.
However, it seems like pods are still in Guaranteed QoS class, even for the test pod in the doc.
```
kdp helloweb-5b78557f66-s45gc | grep QoS
QoS Class: Guaranteed
```
Can someone help me figure out what's going on there? Thanks
Solved! Go to Solution.
The section that I linked to "Limitations" has the instructions, basically you need to `gcloud container cluster upgrade --master` the cluster to the same GKE version that it's already on, which will trigger a control plane restart 🙂
Can you share your deployment spec?
Sure. I just use the example in the doc: https://cloud.google.com/kubernetes-engine/docs/how-to/pod-bursting-gke#deploy-burstable-workload
apiVersion: apps/v1
kind: Deployment
metadata:
name: helloweb
labels:
app: hello
spec:
selector:
matchLabels:
app: hello
tier: web
template:
metadata:
labels:
app: hello
tier: web
spec:
nodeSelector:
pod-type: "non-critical"
tolerations:
- key: pod-type
operator: Equal
value: "non-critical"
effect: NoSchedule
containers:
- name: hello-app
image: us-docker.pkg.dev/google-samples/containers/gke/hello-app:1.0
ports:
- containerPort: 8080
resources:
requests:
cpu: 250m
limits:
cpu: 350m
+1, we are having the exact same issue, with the same tests done and the same results acquired
I'm having same problem with v1.30.2-gke.1587003.
My cluster was created in v1.29.6-gke.1326000, then upgraded to v1.30.2-gke.1587003.
Node version is also v1.30.2-gke.1587003.
However after following the documentation, QoS class for helloweb pod turns out to be "Guaranteed".
@Simelvia @jastes @yanqiang in the Limitations section of the doc, there's instructions to manually restart the control plane, which must happen after your nodes all run a supported version. Could you confirm if you've manually restarted the control plane after the version upgrade completed in your nodes? Just to check, could you try doing that once more and redeploy the Pod to see if that works?
Hi @shannduin , the doc didn't mention how to actually trigger the manual restart. It only mentions `kubectl get nodes` which I did and all pods are in the right version.
The section that I linked to "Limitations" has the instructions, basically you need to `gcloud container cluster upgrade --master` the cluster to the same GKE version that it's already on, which will trigger a control plane restart 🙂
Thanks. I've upgrade the k8s cluster to a even newer version and I guess that restarted the control plan. Now the pod bursting is working. Thanks!
Thank you for your advise, @shannduin , my deployment is burstable now. But there is still a problem on a matter, why we needed Bursting in the first place. We wanted to be able to allocate smaller resources for our multiple micro-deployments, but that still seems not possible. I applied the exact same file as described in docs for a sample burstable workload, but just specified smaller resources:
requests:
cpu: 25m
memory: 128Mi
limits:
cpu: 50m
memory: 256Mi
But nevertheless it automatically modifies to enormously large values:
autopilot.gke.io/resource-adjustment: '{"input":{"containers":[{"limits":{"cpu":"50m","ephemeral-storage":"1Gi"},"requests":{"cpu":"25m","ephemeral-storage":"1Gi","memory":"512Mi"},"name":"hello-app"}]},"output":{"containers":[{"limits":{"cpu":"500m","ephemeral-storage":"1Gi"},"requests":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"512Mi"},"name":"hello-app"}]},"modified":true}'
Why can that happen and how do I overcome it? Thank you in advance.
Maybe specifying requests above 50m CPU might help. As specified in Resource requests in Autopilot#MimumAndMaximum, (50m CPU, 52MiB Memory) is minimum request for general-purpose compute class.
Following shannduin's instruction, I was able to request 50m CPU & 52MiB Memory.
1. Upgrade autopilot cluster.
2. Node will be auto upgraded.
3. Do 1 again to manually restart control plane again.
Yup, this is correct
This is super annoying, we've been at it for a couple of days now.
Issue:
On deploying a burstable pod, we get the autopilot resources adjustment warning and the CPU & Memory limits are not respected.
QoS Class: Burstable
Our nodes version: v1.30.3-gke.1639000
Initial Version: 1.29.7-gke.1008000
Release Channel: Rapid
Answers:
Yes, we manually restarted the control plane after the upgrade to the latest node version based on suggestions by @shannduin
We're using Google's pod example to test: https://cloud.google.com/kubernetes-engine/docs/how-to/pod-bursting-gke
What can we do to resolve this?
Any updates here?
There's a solution in this post
The one with the control plane restart that you shared? That didn't work for me. can you point me to the solution you're referring to?
Wdym by the cpu and memory limits aren't respected? Did it adjust your limits to be equal to requests? Could you post the modified manifest?
Yes, as soon as I deploy the pod (copied from the URL), I get Autopilot Mutator Warning that the CPU resources have been adjusted to meet minimum requirements.
Here's the pod it creates: https://gist.github.com/thesrs02/b4ebbce82340d82b140db2595bf3b840
Hey, I gave this a go and confirmed. If I manually adjust the request to `500m` and set the limit to a higher value like `750m` it works as expected. I'll check if there's an explanation and get back to you
I'm trying to set it to 50m or 250m, not 500m. I know it won't throw a warning on 500m.
I get that, still needed to check to be sure
It seems to have resolved on its own for some reason. Quick question, by default, will every pod be a burstable class?
Only if your limits are different to your requests. If you explicitly set requests == limits, they'll be Guaranteed QoS.
https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-resource-requests#resource-limits
But that's weird that it resolved itself, did you do anything differently to the first time?
@rehan2 so the defaulting happened because the workload in the doc had a nodeSelector and a toleration, which means that it uses workload separation. Autopilot enforces higher minimums (500m CPU) for workload separation (see https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-resource-requests#workload-separa...). So it was working as intended. We'll update the doc to remove that from the manifest, since the example Pod's requests are <500m CPU.
@rehan2 just closing it off here, updated https://cloud.google.com/kubernetes-engine/docs/how-to/pod-bursting-gke#deploy-burstable-workload so that the manifest doesnt use workload separation