Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Cloud Run WebSocket service scaling for no apparent reason

Hi! I'm running a websocket server in cloud run. The settings I currently have are:

  • Max Instances: 10
  • Concurrency: 1000
  • Request Timeout: 3600s

During peak hours, the metrics for this service are:

  • max CPU usage: 20%
  • max Memory usage: 30%
  • Max concurrent requests: 500
  • Containers: 12 (??)

Why is cloud run scaling the service so heavily, when my CPU, memory usage, and number of requests are well below their respective limits? Am I missing something?

Additional Info:

  • I am using the Warp library in rust, which has no internal request limits
  • To be very clear, I have already set max concurrency to 1000, and I'm only receiving 500 concurrent requests. CPU and Memory usage never exceed the limits outlined above; the traffic is not bursty.
  • I am aware that long lived websocket connections will mean that containers will be slow to scale down (as each container will need to complete their long-lived requests beforehand) but this should have no impact when scaling up.
  • I have read the Concurrency and WebSockets Cloud Run documentation, from which I could not gain anything useful.
  • I have tried halving the request timeout to 30mins, but this made no difference.

Any help with this would be greatly appreciated!

 

0 6 391
6 REPLIES 6

One thought i have: If you're using multiple vCPUs: Is your code actually capable of utilizing all CPUs? For example, if you're setting CPU to 4, and your container is really only using 1 CPU, then you can see how even though the utilization looks low, the service actually isn't able to serve more requests concurrently. Some languages don't do a good job of utilizing multiple CPUs. If this is the case, try setting CPU to 1 and see if that helps - you would see more instances, but each would be cheaper. 

Thanks! Thats a good suggestion, but unfortunately I'm only using 1 vCPU

Could it be I/O limits, then? Are you calling a downstream resource that doesn't scale beyond a certain amount? Or VPC connector with too small an instance size?

I am connecting to another cloud run service through websocket (the service this forum thread refers to essentially acts as a "passthrough" between the other service and a client), however it is only a single connection between the two services regardless of the number of concurrent requests.

I am not connecting the services together through a VPC, so I don't think thats the issue.

I will run some tests and check the behaviour of the service when it is not connected to the external resource; hopefully that will narrow it down.

Thanks!

Just to update; I have tried running without this connection and it still behaves in the same way.

I have also tried load testing locally (with 2000 clients) and the CPU usage of the process did not increase significantly.

Some other behaviour I've noticed: If I turn on manual scaling set to 1 container, I will eventually receive "492 No available instance" error. The log message is:

"The request was aborted because there was no available instance. Additional troubleshooting documentation can be found at: https://cloud.google.com/run/docs/troubleshooting#abort-request"

Again, during these error messages max concurrent connections is well below the limit, as well as CPU and Memory usage.

Is there a way to reset the Cloud Run scaling behaviour back to its initial settings? I wonder if cloud run is "remembering" to scale at certain times of the day based on previous load when it doesn't need to.

Thanks again!