We deployed a Dependency Tracker backend on a Standard GKE Cluster with 7 nodes that account for 56 vCPU and 224 GB RAM. The core component is the API server that has a minimum container requirement of 2 vCPU and 4.5 GB RAM. It is intended to be a multi-tenant system (isolation using k8s namespaces) so 1 instance of API server per namespace. The issues we are seeing are below
1. Cluster is highly underutilized:- From the GKE Cost Optimization dashboard which seems misleading the cluster is highly under utilized reason being when users are active on the system we definitely see CPU and memory usage increase though it still does not justify the under utilization. How do we fix this?
2. Auto scaling issues:- We are seeing pretty much default settings with Cluster Autoscaler and Node Autoscaler. The DT API server (deployment not statefulset) uses a single PV but that's about it.
Attaching all the images from GKE cluster.
Hi,
IMHO, to address the (secret sauce) is a combination of proactive resource management, smart scaling strategies, and ongoing monitoring and adjustment. Here is a course of action.
(1) Adjusting Pod Requests and Limits
You would need to update your Kubernetes deployment YAML to adjust the requests and limits:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
replicas: 3 # Adjust this number based on your scaling needs
template:
metadata:
labels:
app: api-server
spec:
containers:
- name: api-server
image: dependencytrack/apiserver
ports:
- containerPort: 8080
resources:
limits:
cpu: "3" # Adjusted from 4 to 3 based on observed usage
memory: "8Gi" # Adjusted from 16Gi to 8Gi
requests:
cpu: "1" # Adjusted from 2 to 1
memory: "4Gi" # Adjusted from 5Gi to 4Gi
volumeMounts:
- mountPath: /data
name: dependency-track
volumes:
- name: dependency-track
persistentVolumeClaim:
claimName: <your-pvc-name>
(2) Implement Horizontal Pod Autoscaler (HPA)
Set up HPA to scale based on CPU and memory usage.
kubectl autoscale deployment api-server --cpu-percent=50 --min=1 --max=10
(3) Using Affinity and Anti-Affinity
To set up pod affinity and anti-affinity, you would modify your deployment configuration:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
template:
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- api-server
topologyKey: "kubernetes.io/hostname"
(4) Setting Budgets and Cost Allocation Tags
To create budgets and set up cost alerts, you would use the GCP console or gcloud CLI, not Kubernetes configuration files. However, to use labels for cost allocation:
kubectl label pods <pod-name> team=finance
(5) Configure Cluser Autoscaler
Ensure your cluster autoscaler is set up correctly for your node pools. Here is the Terraform (IaC)
resource "google_container_cluster" "primary" {
# ... other cluster specs
node_pool {
# ... other node pool specs
autoscaling {
min_node_count = 1
max_node_count = 10
}
}
}
(6) Optimize Node Pool Management
Create multiple node pools for different workload needs.
resource "google_container_node_pool" "secondary" {
# ... other node pool specs
autoscaling {
min_node_count = 1
max_node_count = 5
}
management {
auto_repair = true
auto_upgrade = true
}
node_config {
# Specify a different machine type, disk size, etc.
}
}
(7) Monitor Autoscaling Events
Review the autoscaling events to understand the scaling behavior.
gcloud logging read "resource.type=\"k8s_cluster\" AND jsonPayload.message: \"ClusterAutoscaler\"" --limit 10 --format "table(timestamp, jsonPayload.message)"
I hope it helps
Best Regards
Mahmoud
Thanks @mahmoudrabie . (1) - Right sizing the workloads or understanding the resource usage is the biggest challenge not easy.
100% agree and this is the common pain
The absence of adequate performance analysis and the lack of proactive monitoring.