Monitoring persistent disk utilization for compute instances in Google Cloud can be challenging using standard log metrics, especially when multiple disks are added to a compute instance.
It’s important to have a solution that can monitor disk space and schedule alerts based on log metrics so you can track the performance of your systems and quickly identify potential problems.
Google Cloud compute instances are by default, associated with the boot disk as a mandatory attachment. Users may choose to attach additional disks as required for running their application workloads and to store the application configuration.
To measure the disk utilization for root and additional disk volumes, you can use the df utility on a periodic basis to monitor and write the stats to custom logs using the gcloud logging API. Then, custom logs can be streamed to Google Cloud Monitoring, from which alerts can be scheduled.
Key components of this solution:
Custom logging can be achieved using the gcloud logging API, which enables you to configure alerting policies and real-time log streaming to Google Cloud Monitoring.
The gcloud logging API has the ability to read and write the log entry to/from Google Cloud Monitoring. Below are some samples involving gcloud logging API functionalities.
#To create a log entry in a given log, run:
gcloud logging write LOG_NAME "A simple entry"
#To create a high severity log entry, run:
gcloud logging write LOG_NAME "Urgent message" --severity=ALERT
The custom monitoring service script collects disk utilization metrics on a periodic basis and writes the same to Cloud Monitoring. The service is designed to accept two parameters, such as, warning limit and error limit. Based on the limits configured, disk utilization metrics will be written to custom logs, such as disk_mon_alert_logs and disk_mon_warning_logs.
Log-based alerting is a type of alerting that uses logs to detect and notify you of events that meet certain criteria. In Google Cloud, log-based alerting is provided by Cloud Monitoring.
Once log-based alerting is configured in Cloud Monitoring, Google Cloud will start monitoring the log stream for events that meet the condition you specified. If an event is detected, Google Cloud will send you an alert notification.
Follow the steps below to implement log-based alerts for monitoring persistent disk utilization.
Below is the sample monitoring script, which performs df utility execution every five minutes. Only the root and additional volumes are considered for extracting the final output from df utility execution.
Utilization value for each volume is compared against the configured warning and error limit values. Based on configured limits, logs will be written accordingly to disk_mon_alert_logs and disk_mon_warning_logs using gcloud logging write API.
Log into the GCE (Google Compute Engine) VM instance as the root user. Create a file with the below details.
Filename: diskUtilizationScript.sh
Path: /root/
#!/bin/bash
#=============================================
#Fetching the warning limit & alert limit info
#=============================================
echo "Warning limit is set to: [$1]%"
echo "Error limit is set to: [$2]%"
warningLimit="$1"
errorLimit="$2"
#=============================================
#Fetching the project & hostname information
#=============================================
project_id=`gcloud config list --format='text(core.project)' | sed "s/^.*: //g"`
host_name=`hostname`
echo "ProjectId is: ${project_id}"
echo "HostName is: ${host_name}"
while true
do
df -H | egrep -v "boot|tmpfs|:|Filesystem" | awk '{print $3"\t"$5"\t"$6}' > fileinp.txt
readarray -t my_array < fileinp.txt
for line in "${my_array[@]}"; do
read diskused pcntused mountpoint<<< ${line}
echo "$diskused--$pcntused -- $mountpoint"
compValue=`echo ${pcntused%?}`
if [[ $compValue > $1 ]]; then
if [[ $compValue > $2 ]]; then
JSON_STRING='{"Diskname":"'"$mountpoint"'","Diskspaceused":"'"$diskused"'","Usedpcnt":"'"$pcntused"'","Machine":"'"$host_name"'","Remarks":"'"Threshould is : $errorLimit% and current utilization is: $compValue%"'"}'
gcloud logging write "disk_mon_alert_logs" "${JSON_STRING}" --payload-type=json --severity=ALERT
echo "gcloud alert logging done"
else
JSON_STRING='{"Diskname":"'"$mountpoint"'","Diskspaceused":"'"$diskused"'","Usedpcnt":"'"$pcntused"'","Machine":"'"$host_name"'","Remarks":"'"Threshould is : $warningLimit% and current utilization is: $compValue%"'"}'
gcloud logging write "disk_mon_warning_logs" "${JSON_STRING}" --payload-type=json --severity=WARNING
echo "gcloud warning logging done"
fi
fi
#Flushing variables for loop
mountpoint=""
diskused=""
pcntused=""
echo "Execution Completed"
done
Monitoring script diskUtilizationScript.sh can be scheduled as system service. Below are steps to be followed for configuring the script as service. Service will run automatically after system restart as well.
chmod +x /root/diskUtilizationScript.sh
sudo vim /etc/systemd/system/fs-monitor.service
[Unit]
Description=FileSystem monitoring service Documentation=https://cloud.google.com/logging/docs/agent/ops-agent
[Service]
Type=simple
User=root
Group=root
TimeoutStartSec=0
Restart=on-failure
RestartSec=30s
#ExecStartPre=
ExecStart=/root/diskUtilizationScript.sh 61 75 >> /dev/null 2>&1
SyslogIdentifier=Diskutilization
#ExecStop=
[Install]
WantedBy=multi-user.target
sudo systemctl start fs-monitor.service
sudo systemctl status fs-monitor.service
sudo systemctl stop fs-monitor.service
sudo systemctl restart fs-monitor.service
grep -is "Diskutilization" /var/log/daemon.log
grep -is "Diskutilization" /var/log/syslog
grep -is "Diskutilization" /var/log/messages
To configure notification channels for your alerts, in the Google Cloud Console, navigate to Cloud Monitoring → Notification channels. You can choose from available notifications or create a new channel.
Configure notification channels in Google Cloud
For our use case, we'll create two new notification channels - "Email" and "SMS," for warning and error limit notifications.
Navigate to the Google Cloud Console → Cloud Logging. Follow the below steps to configure the new log-based alerting policies.
Once all the above steps are completed, custom log-based alerting will be enabled in your Google Compute Engine VM instance.
After configuring this alert, if the disk utilization for any volume exceeds 61% (and below 75%), then a warning alert will be triggered via the Email notification channel.
If the disk utilization for any volume exceeds 75%, then an error alert will be triggered via the SMS notification channel.
This solution enables Google Cloud users to configure custom alerting on disk space utilization, even on the additional disks attached to compute instances. You can easily customize limits and opt-in to alerts via various notification channels based on criticality.
Google Cloud log-based alerting is a powerful tool that can help you to detect and respond to events that impact your Google Cloud infrastructure. By creating well-configured log-based alerts, you can improve the reliability and security of your applications.
Have questions? Please leave a comment below and someone from the Google Cloud team or Community will be happy to help.