Solved: Use Performance Compute Class

ojasgo · 05-23-2024 01:25 PM

We recently moved to GKE autopilot knowing its more secure, stable and has less management overhead in terms of managing nodepools. we are seeing this as expected, but the biggest issue we saw is our cost going up 2-3 times. The reason is every container in every pod consumes 0.5 CPU an 512 MB Memory, which is minimum default for Balanced compute class. For most of our apps its set to 100 vcpu and 128MB as request and limits CAN go higher. now I know we cannot set different request limits in autopilot but using lesser resources was something we were looking for.

After researching a bit I saw we can use nodeSelectors on a pod and specify compute-class. Performance compute class seems to be the one for my needs as it allows 1m CPU and 1 Mb memory at minimum. for a given pod I tried setting it as

nodeSelector:
  cloud.google.com/compute-class: Performance

After this we started seeing this error.

Failed to save resource: admission webhook "warden-validating.common-webhooks.networking.gke.io" denied the request: GKE Warden rejected the request because it violates one or more constraints.
Violations details: {"[denied by autopilot-compute-class-limitation]":["the specified 'cloud.google.com/compute-class:Performance' is not supported. Deployment 'XXX.mysvc'."]}
Requested by user: 'XXXXXXX', groups: 'system:authenticated'.

After researching further, I stumbled upon this link. https://cloud.google.com/kubernetes-engine/docs/how-to/performance-pods As explained we tried this example:

apiVersion: v1
kind: Pod
metadata:
  name: performance-pod
spec:
  nodeSelector:
    cloud.google.com/compute-class: Performance
    cloud.google.com/machine-family: c3
    cloud.google.com/gke-ephemeral-storage-local-ssd: "true"
  containers:
  - name: my-container
    image: "k8s.gcr.io/pause"
    resources:
      requests:
        cpu: 100
        memory: "128Mi"
        ephemeral: "1Gi"

Then I saw this error.

Violations details: {"[denied by autogke-node-affinity-selector-limitation]":["Key 'cloud.google.com/machine-family' is not allowed with node selector; Autopilot only allows labels with keys: cloud.google.com/compute-class,cloud.google.com/gke-spot,cloud.google.com/gke-placement-group,topology.kubernetes.io/region,topology.kubernetes.io/zone,failure-domain.beta.kubernetes.io/region,failure-domain.beta.kubernetes.io/zone,cloud.google.com/gke-os-distribution,kubernetes.io/os,kubernetes.io/arch,cloud.google.com/private-node,sandbox.gke.io/runtime,cloud.google.com/gke-accelerator,cloud.google.com/gke-accelerator-count,iam.gke.io/gke-metadata-server-enabled."],"[denied by autopilot-compute-class-limitation]":["the specified 'cloud.google.com/compute-class:Performance' is not supported. Deployment 'xxx.xxxx'."]}
Requested by user: 'xxxx', groups: 'system:authenticated'.

now my question is, If I want to deploy a pod which has multiple containers and They should be able to request 128MB and 100m CPU. Can I do that using ANY compute class in GKE autopilot. If yes, any example or links would be much appreciated. Thank you.

shannduin

Well I *think* the way it'd work out is (assuming that your Pods will request 32 CPU/128 GB total across all Pods):

Performance class (us-central1)

C3 VM cost (8 spot VMs) - ~$304/month
Autopilot flat fee ($0.10*720) = ~$72/month
Performance Class Spot Pod Premium for vCPUs = ~(0.88/month/vCPU * 4 vCPUs * 8 VMs) = ~$28.16/month for vCPUs
Performance Class Spot Pod Premium for memory = ~(0.11/month/GB * 16GB * 8VMs) = ~$14.08/month for memory

So the total approximate for Performance is ~$418.24 a month

General-purpose (us-central1)

Autopilot flat fee: ~$0.10*720 hours = ~$72/month
Spot Pod price for vCPUs: $9.72/month/vCPU * 32 vCPUs = $310.72/month
Spot Pod price for memory: $1.08/month/GB * 128 GB = $138.24/month

So the total for general-purpose would be $520.96

But keep in mind that you're being charged for the VMs in compute engine regardless of whether your Pods are fully using the VM resources, whereas in general-purpose you're getting charged only for the actual resource requests of your Pods. So Performance is cheaper than general-purpose if you plan on your Pod resource requests always totalling up to 32 CPU / 128 GB, but if your resource requests are going to be lower during normal operations, then general purpose might work out to be cheaper for you. Also note that with Performance class, GKE puts each Pod on its own node. So if you have 8 Pods, you'll get 8 VMs, but if you have fewer Pods, you get fewer VMs (and the pricing changes).

Again, big disclaimer that I'm not a pricing person so this could be completely wrong.

Another thing that you seem to want is the ability for your Pods to burst beyond their requests. This is supported in Performance, but will at some point be available in other compute classes (https://cloud.google.com/kubernetes-engine/docs/how-to/pod-bursting-gke), so you'll be able to set low resource requests and no or higher limits, and pay for the requests.

The Performance document has the minimum required GKE version for running performance class 🙂

View solution in original post

shannduin

If cost savings is your goal, performance might not be for you. You pay for the compute engine machine that you run on as well as the Autopilot premium.

For Performance, you're seeing an error potentially because your cluster isn't running a supported GKE version. The minimum for Performance is low because it supports bursting into the full VM resources.

I'd recommend running your Pods on the default class (so don't specify a compute class). Right now, the minimum for that is 0.25 CPU and the memory is still 512 MB, but it'll be cheaper than Performance imo for your use case

ojasgo

Thank you shannduin for your reply.

Where can I find the compatibility matrix for different compute classes and kubernetes versions?

I ran some calculations.

based on this. https://cloud.google.com/spot-vms/pricing

for `c3-standard-4` SPOT instance there are 4 vCPUs and 16GB RAM

cost of 1 cpu/Hr in us-central1 is `0.00835`. = for 1 month = 0.00835*4*24*30 = `24.048`
cost of 1 GB/Hr in us-central1 is `0.001119` = for 1 month = 0.001119*16*24*30 = `12.89088`

It comes to about 37$ per Month. We know, we would need about 8 such VMs to run our workload ~= 300$

What we have seen so far is FORCASTED costs are around 500$ per month for a given autopilot cluster. Please let me know if I am missing something.

Regardless, we would like to run run our workloads on performance compute class. It would be helpful atleast to "know", if we can do it for future reference. Thanks.

shannduin

https://cloud.google.com/kubernetes-engine/docs/how-to/performance-pods has it in the Before You Begin section 🙂 you can combine performance class with spot pods (as detailed in the Compatibility with other GKE features section in that doc) so that might be a good way to cost optimize.

ojasgo

I additionally found this. I saw separate pricing page for spot autopilot instances.

Based on this pricing page https://cloud.google.com/kubernetes-engine/pricing

I can see, performance compute-class, incurs an autopilot premium, which would be around 5$ extra

General-purpose compute class. = (0.0133*4*24*30)+(0.0014767*16*24*30) = 55.315584

Performance Compute class = (0.0012*4*24*30)+(0.00015*16*24*30) = 5.184

but I am still finding performance cheaper. Any pointers would be much appreciated. Thanks

shannduin

Well I *think* the way it'd work out is (assuming that your Pods will request 32 CPU/128 GB total across all Pods):

Performance class (us-central1)

C3 VM cost (8 spot VMs) - ~$304/month
Autopilot flat fee ($0.10*720) = ~$72/month
Performance Class Spot Pod Premium for vCPUs = ~(0.88/month/vCPU * 4 vCPUs * 8 VMs) = ~$28.16/month for vCPUs
Performance Class Spot Pod Premium for memory = ~(0.11/month/GB * 16GB * 8VMs) = ~$14.08/month for memory

So the total approximate for Performance is ~$418.24 a month

General-purpose (us-central1)

Autopilot flat fee: ~$0.10*720 hours = ~$72/month
Spot Pod price for vCPUs: $9.72/month/vCPU * 32 vCPUs = $310.72/month
Spot Pod price for memory: $1.08/month/GB * 128 GB = $138.24/month

So the total for general-purpose would be $520.96

But keep in mind that you're being charged for the VMs in compute engine regardless of whether your Pods are fully using the VM resources, whereas in general-purpose you're getting charged only for the actual resource requests of your Pods. So Performance is cheaper than general-purpose if you plan on your Pod resource requests always totalling up to 32 CPU / 128 GB, but if your resource requests are going to be lower during normal operations, then general purpose might work out to be cheaper for you. Also note that with Performance class, GKE puts each Pod on its own node. So if you have 8 Pods, you'll get 8 VMs, but if you have fewer Pods, you get fewer VMs (and the pricing changes).

Again, big disclaimer that I'm not a pricing person so this could be completely wrong.

Another thing that you seem to want is the ability for your Pods to burst beyond their requests. This is supported in Performance, but will at some point be available in other compute classes (https://cloud.google.com/kubernetes-engine/docs/how-to/pod-bursting-gke), so you'll be able to set low resource requests and no or higher limits, and pay for the requests.

The Performance document has the minimum required GKE version for running performance class 🙂

ojasgo

Thank you so much Shannduin. This clarifies so much. specially this.

@shannduin wrote:
But keep in mind that you're being charged for the VMs in compute engine regardless of whether your Pods are fully using the VM resources, whereas in general-purpose you're getting charged only for the actual resource requests of your Pods.