Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

GCP console alerts

Hi all

I was configuring alerts in google cloud console using promql query. I wanted to get pod restart alerts. For that I had used following promql query:

sum by(pod_name)(
kubernetes_io:contianer_restart_count(monitored_resources="k8s_container")
-
kubernetes_io:contianer_restart_count(monitored_resources="k8s_container") offset 10m
) > 0

The problem is, it keep alerting for the same pod restart like continuously every time there is a restart.

What I wanted was at least there should be a 10mins gap between the alerts so that we don't get bombarded with the alerts. But we are getting bombarded right now.

Other details:

alert trigger value: Any time series violates

retest window: 0secs

evaluation interval: 1min

incident autoclose duration: 30mins

0 1 2,231
1 REPLY 1

Hi @litoco,

Welcome to Google Cloud Community!

You're right, the issue lies in the PromQL query you're using. It's designed to trigger an alert every time the pod restart count changes, even if it's just a single increment.

Here's a breakdown of the changes:

  • increase(kubernetes_io:container_restart_count{monitored_resources="k8s_container"}[10m]): This calculates the increase in the restart count over the past 10 minutes.
  • > 0: This condition ensures the alert triggers only when there has been at least one pod restart within the 10-minute interval.

Key points:

  • This query now uses the increase function to track changes in the restart count over time, providing a more accurate representation of pod restarts occurring within a specific timeframe.
  • The [10m] argument specifies the 10-minute window for calculating the increase, ensuring a minimum gap between alerts.
  • This solution respects your need for immediate pod restart detection while preventing alert bombardment.

Additional Considerations:

  • Tuning the 10-minute window: You can adjust the [10m] value to fine-tune the alert frequency based on your specific needs.
  • Alerting on specific pods: You can filter the query by pod_name if you want to focus on specific pods or specific namespaces.

By making these changes, you should achieve your goal of receiving a single alert for each pod restart with a 10-minute delay between alerts.

I hope the above information is helpful.

Top Labels in this Space