I am attempting to create a log-based alert however I am running into some issues (possibly due to lack of documentation) which is causing alerts not to trigger. This is being done through terraform however I have also tried doing it through gcloud with a policy json with the same results
I have tried creating a simple log based alert with the following alert strategy
"alertStrategy": {
"autoClose": "86400s",
"notificationRateLimit": {
"period": "1800s"
}
}
My understanding of this is that if no more errors (of the same type defined by the combination of label extractors) happen within a day from the last received error, it auto closes and if an error of the same type defined by the combination of label extractors happen within 30 minutes of the previous one, another notification is not sent.
When testing generating error logs which match the log query, this did not trigger an alert. When I set the values to a notification rate limit of 300s and autoClose of 1800s then it works. Is this a potential bug in the log based alerts or do I missunderstand how these values work?
The other issues I am seeing is with the label extractors. I have tried setting these extractors
"labelExtractors": {
"message": "REGEXP_EXTRACT(jsonPayload.message, \"(.+)\")",
"url": "REGEXP_EXTRACT(httpRequest.requestUrl \"(.+)\")",
}
However I also had problems where I received no alerts when getting matching error logs. I found out that removing the "url" label extractor would solve the problem and upon further inspection I realized the `httpRequest.requestUrl` field did not exist in the error logs I expected to trigger the alert however the documentation for the label extractor field found at https://cloud.google.com/logging/docs/reference/v2/rest/v2/projects.metrics#LogMetric.FIELDS.value_e... mentions the following
The extracted value is converted to the type defined in the label descriptor. If either the extraction or the type conversion fails, the label will have a default value. The default value for a string label is an empty string, for an integer label its 0, and for a boolean label its false.
So I would have expect the log to contain a label with an empty string in this case and still cause an alert. Am I missing something here?
For reference, this is the entire policy I am trying to setup
{
"alertStrategy": {
"autoClose": "86400s",
"notificationRateLimit": {
"period": "1800s"
}
},
"combiner": "OR",
"conditions": [
{
"conditionMatchedLog": {
"filter": "resource.type = \"cloud_run_revision\" AND resource.labels.configuration_name =~ \"my-service\" AND severity >= ERROR\n",
"labelExtractors": {
"message": "REGEXP_EXTRACT(jsonPayload.message, \"(.+)\")",
"url": "REGEXP_EXTRACT(httpRequest.requestUrl, \"(.+)\")"
}
},
"displayName": "My service has returned an error",
"name": "projects/my-google-project/alertPolicies/12413651413025732590/conditions/20614653453035532681"
}
],
"creationRecord": {
"mutateTime": "2022-08-08T16:28:58.273577647Z",
"mutatedBy": "me@mycompany.com"
},
"displayName": " My Service",
"documentation": {
"content": "The service has experienced errors",
"mimeType": "text/markdown"
},
"enabled": true,
"mutationRecord": {
"mutateTime": "2022-08-08T17:29:29.565788300Z",
"mutatedBy": "me@mycompany.com"
},
"name": "projects/my-google-project/alertPolicies/20633654413535762590",
"notificationChannels": [
"projects/my-google-project/notificationChannels/1147262444713249459",
],
"userLabels": {
"severity": "critical"
}
}
Hi
below documentation will help you for log based alert limitations
https://cloud.google.com/logging/docs/alerting/monitoring-logs#alert-diffs