Hello,
i have a project, I want to monitor all my instances with GCP cloud monitoring.
But my instances are OFF at night so I want to turn OFF my alerts at the same time. I noticed that "snooze" was able to do it but I have to repeat it every day.
Do you have something to help me?
Thanks all,
Howdy Loulie,
What a great question!! I looked at the docs on snooze and don't think that is what we want. A snooze is a timed disable of an alert that is (in my opinion) to be used when we know we are going to take down a monitored service. For example .... I know that for the next 60 minutes I am going to be taking my production system out of service for expected maintenance. The snooze ensures that if I "forget" to turn it back on again, it will be tuned on automatically after an hour.
In our story, I am hearing that we explicitly shutdown our monitored service every night. This brings me to my first set of questions:
I studied a little more and found that the alerting policies have an enabled/disabled property. This means that we can define a policy and switch it on and off. Where my mind is going now is that I am thinking that there is a process/script/workflow being executed to shutdown your service and that we can augment that with a request to disable the alert policy and THEN shutdown the service. Conversely, when the service is re-started, we start the service and then re-enable the alert policy. Depending on the nature of the service you are using, we can either perform these steps in the startup/shutdown command or ... and this will be dependent on your own designs ... have the started service ITSELF enable the alert at startup and stop the service at shutdown (startup and shutdown scripts).
Another thought ... and this again depends on what service you are starting/stopping, we could potentially look at the policy itself. For example, I'm imagining some alert policy that says "Raise an alert if we fail a heartbeat 3 times" is what you may have today ... but maybe we can find a policy that says "Raise an alert if we fail a heartbeat 3 times AND the service has not been stopped".
Hi Kolban,
Thanks for this answer.
To answer your questions :
I want to monitor several things like memory usage, disk usage.. But on this point I think I found the solution. I noticed when I'm creating alert I can put a filter " State = Active". So, I didn't receive any alert when my instances were OFF.
But I want to monitor one last thing. Create an alert to ping the instance to know if she is ON or OFF (outside the OFF time..). GCP calls this "Uptime Checks". After several searches, I found "snooze" solution to turn off "uptime check" when I want.. but as you said it's not really what I'm looking for, it's not the best solution.
So, I'm taking notes of your ideas. I'll do some research. Thank you!!