Site Uptime Failure Monitoring

I have some automated alerts setup with google cloud to help notify me about any site down issues. I’ve received a couple today & yesterday that were fairly random in nature, and I got the alert & resolution emails pretty close together.

It doesn’t appear that there’s been any significant impact to the site, but it’s not normal behavior and so I’m concerned there could be a problem.

For reference, i’ve listed the alert and resolution times (EST) for all the emails I got today & yesterday below:

Alert - 8/10 11:08 AM

Resolved - 8/10 11:10 AM

Alert - 8/10 10:33 PM

Resolved - 8/10 10:35 PM

Alert - 8/10 10:53 PM

Resolved - 8/10 10:55 PM

Alert - 8/11 12:53 PM

Resolved - 8/11 12:55 PM

Alert - 8/11 1:38 PM

Resolved 8/11 1:40 PM

Below is the json from monitoring: 

Screenshot 2023-08-11 at 6.32.39 PM.png

Note: I do have also pingdom and pingdom did not through any alert for site being down.

Screenshot 2023-08-11 at 6.26.58 PM.png

How to get more info on what happened during that timeframe, is this an issue on gc side? Note that we have set filters for us and ca on the store.

Thanks,

2 1 186
1 REPLY 1

Hello @ylgenguxholli,

Welcome to Google Cloud Community!

You might want to check the re-test strategy to re-verify the conditions.


You use the duration, or the duration window, to prevent a condition from triggering due to a single measurement or forecast. For example, assume that the duration field for a condition is set to 15 minutes. The following describes the behavior of the condition based on its type:

  • Metric-threshold conditions trigger when, for a single time series, every aligned measurement in a 15-minute interval violates the threshold.
  • Metric-absence conditions trigger when no data arrives for a time series in a 15-minute interval.
  • Forecast conditions trigger when every forecast produced during a 15-minute window predicts that the time series will violate the threshold within the forecast window.

For policies with one condition, an incident is opened and notifications are sent when its condition triggers. These incidents stay open while the condition continues to be met.