Auto Close incident created by GCP alert

I am currently working on GCP alerting for Apigee hybrid. I have created alert, where we are checking missing traffic on production environment, and also I have set auto-close duration as 1 day. In one scenario where I got incident for this alert condition it got closed after 1 day 18 hours. Now I am not able to understand If I gave auto-close duration 1 day, why it is getting close after 1 day 18 hours. 

Solved Solved
0 1 723
1 ACCEPTED SOLUTION

Good day and welcome to the community!

Looking into Google Cloud Monitoring documentation, when time series data stops arriving or  is delayed, Monitoring classifies the data as missing. This means missing data can result in policies not alerting and incidents not closing. 

If delays in data can happen for several reasons, including network issues and delays in processing data. When an incident is open and no data arrives for the auto-close duration plus 24 hours, the incident is closed for conditions that aren’t met, this setting causes the metric-threshold condition to remain open for the specified duration and then close automatically after that duration has elapsed

You may also look into the following as they may contribute to the delays:

  1. Intermittent Issues: If the issue that triggered the incident is still intermittently occurring, it might prevent the auto-closure from kicking in. Remember, the auto-close duration begins only after the issue has been resolved and remains clear for the specified period.
  2. Metrics Reporting Delays: There might be a delay in the reporting of metrics which could affect when an incident is seen as resolved.
  3. Time Zone Differences: Check to see if there's a time zone difference between your local time and the server time that could be a factor for the perceived delay.
  4. Alerting System Delays: In some cases, there might be delays in GCP internal alerting systems.

I attached documentation links for your use case and I hope it helps. [1][2][3]

[1 ]https://cloud.google.com/monitoring/alerts/concepts-indepth

[2] https://cloud.google.com/logging/docs/alerting/log-based-alerts

[3] https://cloud.google.com/monitoring/alerts/using-alerting-ui

View solution in original post

1 REPLY 1

Good day and welcome to the community!

Looking into Google Cloud Monitoring documentation, when time series data stops arriving or  is delayed, Monitoring classifies the data as missing. This means missing data can result in policies not alerting and incidents not closing. 

If delays in data can happen for several reasons, including network issues and delays in processing data. When an incident is open and no data arrives for the auto-close duration plus 24 hours, the incident is closed for conditions that aren’t met, this setting causes the metric-threshold condition to remain open for the specified duration and then close automatically after that duration has elapsed

You may also look into the following as they may contribute to the delays:

  1. Intermittent Issues: If the issue that triggered the incident is still intermittently occurring, it might prevent the auto-closure from kicking in. Remember, the auto-close duration begins only after the issue has been resolved and remains clear for the specified period.
  2. Metrics Reporting Delays: There might be a delay in the reporting of metrics which could affect when an incident is seen as resolved.
  3. Time Zone Differences: Check to see if there's a time zone difference between your local time and the server time that could be a factor for the perceived delay.
  4. Alerting System Delays: In some cases, there might be delays in GCP internal alerting systems.

I attached documentation links for your use case and I hope it helps. [1][2][3]

[1 ]https://cloud.google.com/monitoring/alerts/concepts-indepth

[2] https://cloud.google.com/logging/docs/alerting/log-based-alerts

[3] https://cloud.google.com/monitoring/alerts/using-alerting-ui