Re: Cloud function healthcheck before end of deplo...

gasgallo · 08-19-2024 02:28 AM

Hello,

I'm facing a rather weird issue with Cloud Functions.

Sometimes, while updating a function (gcloud functions deploy), I get a bunch of error logs due to the healthcheck sending requests to the instance even though the deployment is not completed yet. An example timeline is the following:

update function begins at 12.00am
healthcheck spams requests at 12.02am
error logs
update function complete at 12.03am
no more error logs

Here's a sample of the error log:

{
"textPayload": "The request was aborted because there was no available instance. Additional troubleshooting documentation can be found at: https://cloud.google.com/functions/docs/troubleshooting#scalability",
"insertId": "xxx",
"httpRequest": {
"requestMethod": "GET",
"requestUrl": "<URL removed by staff>",
"requestSize": "245",
"status": 500,
"remoteIp": "xxx",
"latency": "0s",
"protocol": "HTTP/1.1"
},
"resource": {
"type": "cloud_function",
"labels": {
"function_name": "xxx",
"project_id": "xxx",
"region": "us-central1"
}
},
"timestamp": "2024-08-19T06:16:30.343513Z",
"severity": "ERROR",
"labels": {
"infrastructure": "error"
},
"logName": "xxx/logs/cloudfunctions.googleapis.com%2Fcloud-functions",
"trace": "xxxx/traces/312ca73771fd71a7162dcb4b86b55159",
"receiveTimestamp": "2024-08-19T06:16:30.635757405Z",
"spanId": "xxx",
"traceSampled": true
}

{"textPayload": "The request was aborted because there was no available instance. Additional troubleshooting documentation can be found at: https://cloud.google.com/functions/docs/troubleshooting#scalability","insertId": "xxx","httpRequest": {"requestMethod": "GET","requestUrl": "<URL removed by staff>","requestSize": "245","status": 500,"remoteIp": "xxx","latency": "0s","protocol": "HTTP/1.1"},"resource": {"type": "cloud_function","labels": {"function_name": "xxx","project_id": "xxx","region": "us-central1"}},"timestamp": "2024-08-19T06:16:30.343513Z","severity": "ERROR","labels": {"infrastructure": "error"},"logName": "xxx/logs/cloudfunctions.googleapis.com%2Fcloud-functions","trace": "xxxx/traces/312ca73771fd71a7162dcb4b86b55159","receiveTimestamp": "2024-08-19T06:16:30.635757405Z","spanId": "xxx","traceSampled": true}

Is this behaviour expected?

ronnelg

Hi @gasgallo,

Welcome to Google Cloud Community!

I see that your health check makes frequent requests that it sometimes returns a pending queue abort error "The request was aborted because there was no available instance” while you’re updating an existing function. The behavior here is very likely an intended behavior as illustrated by scenarios explained below.

Usually, this issue may arise either because the number of max_instances you’ve set is preventing Cloud Function from scaling (identified by error code 429), or Cloud Function is intrinsically having an issue managing the ongoing traffic (identified by error code 500). Your error message indicates that your case falls under code 500.

Here are the recommended methods that may help to address this behavior:

Double check the condition of your function. Error 500 suggests that there may be a spike in traffic (perhaps of your function’s current version) due to conditions like higher latency of internal function resources or overwhelmed queries per second (QPS) experienced by your function. This could disrupt the traffic migration flow while updating your function.
Modify your health check. If you’ve configured a custom health check in your function, it’s possible that your health check is making requests too frequently. If the backend service takes longer to create a new version and redirect traffic, then the health check may log errors.
Change the value you’ve set for your max_instances. Though your error code is not 429, try increasing the maximum instances limit so your function may not run into a possible overload as it tries to scale up while migrating its traffic to its new version.

If none of those methods is helpful, note that our team is also keeping an eye on the nature of this scenario. We have documented this issue in the public tracker for further investigation.

I hope the possible solutions above work out for you.

gasgallo

Hi @ronnelg,

thank you for the reply. I already went through the troubleshooting that you suggested but it doesn't apply to my scenario.

In my case, during the cloud function update there's no traffic (this is because the cloud function is triggered only at specific times of the day).

Moreover, the healthcheck failure comes from an internal GCP healthcheck (not a custom healthcheck), the endpoint sending the healthcheck ends with `appspot.com/_ah/healthcheck`.

Finally, the number of max instances is already set to 3000, thus I don't think this is the issue.

Looking forward to more suggestions and updates.

Cloud function healthcheck before end of deployment