Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Larger nodes in GKE autopilot cluster to accomodate bursts of k8s jobs

I am running mostly kubernetes jobs, instead of constant deployments in my GKE autopilot cluster. As a result, most of the time, there are no pods running (besides the control plane) on my GKE autopilot cluster, and there is only 1 node available. But then often 4-5 jobs come it at once that start up pods. That usually results in not enough CPU resources being available, and hence the autopilot cluster spins up additional nodes. By itself that introduces latency that I am ok with. But often there is a succession of burst loads of kubernetes jobs all roughly the same time, so more and more nodes get spun up. Is it possible to somehow make sure the nodes that the GKE autopilot cluster starts up are larger, so they can accommodate more pods for all those jobs, instead of constantly starting up new nodes?

Regards,

Dolf.

0 1 970
1 REPLY 1

By following these strategies, you can optimize your workload for bursty workloads with larger nodes in Autopilot.

  1. Pod Resource Requests and Limits:

Increase Requests: Set higher CPU and memory requests for your job pods to signal Autopilot to provision larger nodes.
Adjust Limits (Optional): Set appropriate limits to ensure fair resource allocation and prevent pods from overconsuming resources.

  1. Resource Utilization Targets:

Lower Target: Decrease the targetUtilization field in the Autopilot configuration to favor larger nodes with more available resources.
Caution: This might lead to underutilized nodes during low-demand periods.

  1. Pod Topology Spread Constraints:

Apply Constraints: Use pod topology spread constraints to distribute pods across multiple nodes, reducing the likelihood of resource contention and frequent node scaling.

  1. Optimized Node Pool:

Consider Switch: If Autopilot's behavior doesn't fully align with your workload patterns, consider switching to a standard node pool with more granular control over node sizes and scaling.

Top Labels in this Space