Cloud run timeout when autoscaling

cgaunet · 12-12-2024 07:09 AM

Hello!

We’ve been experiencing timeouts on our Cloud Run service over the past few days, possibly correlated with an increase in traffic.

We’ve tried to investigate the issue by analyzing all available metrics on the Cloud Run dashboard and reviewing the logs to determine the cause.

Our best guess is that some requests are pending while the service is starting up, and they arrive simultaneously, which overwhelms the newly started instance.

Additionally, we’ve observed that sometimes an instance will return a 503 Instance Unavailable error after functioning correctly for a while, which makes it difficult to pinpoint the problem.

CPU and memory usage seem normal, so we’re at a bit of a dead end in our investigation.

Does anyone have any insights or suggestions for handling this kind of situation?

mcbsalceda

Hi @cgaunet,

When traffic spikes, Cloud Run might need to spin up new instances, and requests could queue up while the new instance starts, leading to timeouts. You can avoid this by setting a minimum instance to keep at least one instance running at all times.

Cloud Run scales every 5 seconds, so if the load is high, scaling might not keep up. You might want to adjust the max instances or consider adding a small grace period for better handling of traffic spikes.

To resolve your issue regarding the 503 error, you may check these troubleshooting guides:

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.