We use monitoring alerts to determine uptime latency and spot outages for our services. Today we received an alert for an open issue, but it seems that perhaps the monitoring service itself had an outage. The latency monitoring page tracked no data for about an hour and a half and the regions that test the latency had a status of “no checks have run yet”. The tested service itself was functioning as expected during this time.
The monitoring is back up and running now, but I am wondering if there is a way to prevent an alerting policy issue from being opened when the monitoring fails rather than the services that are being tested?
We can look at the Cloud Monitoring status summary here:
https://status.cloud.google.com/summary
The status page doesn't claim any outages in the last couple of weeks. This raises my suspicions. While it is obviously more than possible that Google Cloud had some form of outage ... I would really hope that would manifest in the status pages. If Google Cloud didn't have a monitoring outage, then maybe we can start looking to see if there might be some other identifiable explanation. If you have a support contract, this feels like a good item for a support ticket where the support engineers are authorized to look into your specific configurations and history.