Hi everyone,
I'm trying to understand the following error that I'm currently getting.
The architecture is pretty basic:
The issue is that 99% of requests are completed without problems but sometimes API Gateway returns a 503 with the following message:
When checking the Logs Explorer for the API Gateway I also see this information:
Is this related to the fact that minimum number of instances for the Cloud Run Service is set to 0 and when this request comes, there are no active instances ready to serve the request?
Any advice on how to handle this or any further ideas?
Best regards,
Jaime
Hi @JaimeFabian,
Welcome to Google Cloud Community!
The error message you're seeing, "upstream connect error or disconnect/reset before headers. retried and the latest reset reason: connection termination," suggests that there is an issue with the connection between API Gateway and your Cloud Run Service. The "upstream_reset_before_response_started" part of the message indicates that the connection was reset before a response could be sent from the Cloud Run Service to API Gateway.
One possible explanation for this issue is that there are not enough instances of your Cloud Run Service running to handle the incoming requests. As you mentioned, the minimum number of instances is set to 0, which means that the service will scale down to zero instances when there is no traffic. If a request comes in while there are no active instances, API Gateway will not be able to connect to the Cloud Run Service, resulting in a 503 error.
You can try a few things to address this issue:
It's also worth noting that in some cases, the 503 error may be caused by a problem on the Cloud Run Service side, such as a bug in the service's code or a resource exhaustion issue. It's a good idea to check the logs of your service to see if there are any errors or warning messages that might indicate the cause of the issue.
It's a good practice to keep monitoring and logging all the requests and responses of your service so that you can further investigate on why certain requests resulted in 503 error, and take appropriate steps.
Thank you.
Hi!
Thanks for the detailed reply! I will just have to try to monitor this better since it does not happen all the time. I've also checked my Cloud Run Services and at the moment of the requests, the instances are normally in idle but I was expecting for an instance to become active so the request can be handled.
CPU and memory do not seem to be a problem, since they are barely at 30% to 35%.
I will keep an eye on this issue.
Thank you!
Jaime