Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

GCP API Gateway mapped to Cloud Run Service returns 503

Hi everyone,

I'm trying to understand the following error that I'm currently getting.

The architecture is pretty basic:

  • FastAPI running on Cloud Run Service.
  • API Gateway configured to redirect requests to my FastAPI Endpoints running on Cloud Run Service.
  • Web application sending requests to API Gateway.

The issue is that 99% of requests are completed without problems but sometimes API Gateway returns a 503 with the following message:

  •  upstream connect error or disconnect/reset before headers. retried and the latest reset reason: connection termination.

When checking the Logs Explorer for the API Gateway I also see this information:

  • response_code_detail: "upstream_reset_before_response_started{connection_termination}"

Is this related to the fact that minimum number of instances for the Cloud Run Service is set to 0 and when this request comes, there are no active instances ready to serve the request?

Any advice on how to handle this or any further ideas?

Best regards,

Jaime

0 2 4,527
2 REPLIES 2

Hi @JaimeFabian,

Welcome to Google Cloud Community!

The error message you're seeing, "upstream connect error or disconnect/reset before headers. retried and the latest reset reason: connection termination," suggests that there is an issue with the connection between API Gateway and your Cloud Run Service. The "upstream_reset_before_response_started" part of the message indicates that the connection was reset before a response could be sent from the Cloud Run Service to API Gateway.

One possible explanation for this issue is that there are not enough instances of your Cloud Run Service running to handle the incoming requests. As you mentioned, the minimum number of instances is set to 0, which means that the service will scale down to zero instances when there is no traffic. If a request comes in while there are no active instances, API Gateway will not be able to connect to the Cloud Run Service, resulting in a 503 error.

You can try a few things to address this issue:

  1. Increase the minimum number of instances for your Cloud Run Service to ensure that there are always some instances running to handle incoming requests.
  2. Increase the number of requests per second (RPS) that your Cloud Run Service can handle by adjusting the number of container instances. This will ensure that your service can handle more traffic and reduce the likelihood of a 503 error.
  3. Check your logs to see what the typical request rate is, and adjust your instance autoscaling policy accordingly.
  4. Consider adding a health check to your service so that the API Gateway can detect when the service is down and route traffic to a different service or a fallback endpoint.
  5. Monitor the resource consumption and other metrics of the Cloud Run service to ensure that it's running efficiently and that resources such as memory and CPU aren't being exhausted by requests.
  6. Enabling Cloud Run's internal traffic balancing can also be useful, this will distribute the requests across multiple Cloud Run instances and have less chance of having a 503 error because of a overload of requests in a single instance.

It's also worth noting that in some cases, the 503 error may be caused by a problem on the Cloud Run Service side, such as a bug in the service's code or a resource exhaustion issue. It's a good idea to check the logs of your service to see if there are any errors or warning messages that might indicate the cause of the issue.

It's a good practice to keep monitoring and logging all the requests and responses of your service so that you can further investigate on why certain requests resulted in 503 error, and take appropriate steps.

Thank you.

Hi!

Thanks for the detailed reply! I will just have to try to monitor this better since it does not happen all the time. I've also checked my Cloud Run Services and at the moment of the requests, the instances are normally in idle but I was expecting for an instance to become active so the request can be handled.

CPU and memory do not seem to be a problem, since they are barely at 30% to 35%.

I will keep an eye on this issue.

 

Thank you!

Jaime