Hi all
I was configuring alerts in google cloud console using promql query. I wanted to get pod restart alerts. For that I had used following promql query:
sum by(pod_name)( kubernetes_io:contianer_restart_count(monitored_resources="k8s_container") - kubernetes_io:contianer_restart_count(monitored_resources="k8s_container") offset 10m ) > 0
The problem is, it keep alerting for the same pod restart like continuously every time there is a restart.
What I wanted was at least there should be a 10mins gap between the alerts so that we don't get bombarded with the alerts. But we are getting bombarded right now.
alert trigger value: Any time series violates
retest window: 0secs
evaluation interval: 1min
incident autoclose duration: 30mins
Hi @litoco,
Welcome to Google Cloud Community!
You're right, the issue lies in the PromQL query you're using. It's designed to trigger an alert every time the pod restart count changes, even if it's just a single increment.
Here's a breakdown of the changes:
increase(kubernetes_io:container_restart_count{monitored_resources="k8s_container"}[10m])
: This calculates the increase in the restart count over the past 10 minutes.> 0
: This condition ensures the alert triggers only when there has been at least one pod restart within the 10-minute interval.Key points:
Additional Considerations:
pod_name
if you want to focus on specific pods or specific namespaces.By making these changes, you should achieve your goal of receiving a single alert for each pod restart with a 10-minute delay between alerts.
I hope the above information is helpful.
User | Count |
---|---|
3 | |
1 | |
1 | |
1 |