Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Cloud Run WebSocket service scaling for no apparent reason

Hi! I'm running a websocket server in cloud run. The settings I currently have are:

  • Max Instances: 10
  • Concurrency: 1000
  • Request Timeout: 3600s

During peak hours, the metrics for this service are:

  • max CPU usage: 20%
  • max Memory usage: 30%
  • Max concurrent requests: 500
  • Containers: 12 (??)

Why is cloud run scaling the service so heavily, when my CPU, memory usage, and number of requests are well below their respective limits? Am I missing something?

Additional Info:

  • I am using the Warp library in rust, which has no internal request limits
  • To be very clear, I have already set max concurrency to 1000, and I'm only receiving 500 concurrent requests. CPU and Memory usage never exceed the limits outlined above; the traffic is not bursty.
  • I am aware that long lived websocket connections will mean that containers will be slow to scale down (as each container will need to complete their long-lived requests beforehand) but this should have no impact when scaling up.
  • I have read the Concurrency and WebSockets Cloud Run documentation, from which I could not gain anything useful.
  • I have tried halving the request timeout to 30mins, but this made no difference.

Any help with this would be greatly appreciated!

 

0 7 2,583
7 REPLIES 7

One thought i have: If you're using multiple vCPUs: Is your code actually capable of utilizing all CPUs? For example, if you're setting CPU to 4, and your container is really only using 1 CPU, then you can see how even though the utilization looks low, the service actually isn't able to serve more requests concurrently. Some languages don't do a good job of utilizing multiple CPUs. If this is the case, try setting CPU to 1 and see if that helps - you would see more instances, but each would be cheaper. 

Thanks! Thats a good suggestion, but unfortunately I'm only using 1 vCPU

Could it be I/O limits, then? Are you calling a downstream resource that doesn't scale beyond a certain amount? Or VPC connector with too small an instance size?

I am connecting to another cloud run service through websocket (the service this forum thread refers to essentially acts as a "passthrough" between the other service and a client), however it is only a single connection between the two services regardless of the number of concurrent requests.

I am not connecting the services together through a VPC, so I don't think thats the issue.

I will run some tests and check the behaviour of the service when it is not connected to the external resource; hopefully that will narrow it down.

Thanks!

Just to update; I have tried running without this connection and it still behaves in the same way.

I have also tried load testing locally (with 2000 clients) and the CPU usage of the process did not increase significantly.

Some other behaviour I've noticed: If I turn on manual scaling set to 1 container, I will eventually receive "492 No available instance" error. The log message is:

"The request was aborted because there was no available instance. Additional troubleshooting documentation can be found at: https://cloud.google.com/run/docs/troubleshooting#abort-request"

Again, during these error messages max concurrent connections is well below the limit, as well as CPU and Memory usage.

Is there a way to reset the Cloud Run scaling behaviour back to its initial settings? I wonder if cloud run is "remembering" to scale at certain times of the day based on previous load when it doesn't need to.

Thanks again!

 

I'm having exactly the same issue using FastAPI websockets. Getting "429 The request was aborted because there was no available instance".

Top Solution Authors