Hi! I'm running a websocket server in cloud run. The settings I currently have are:
During peak hours, the metrics for this service are:
Why is cloud run scaling the service so heavily, when my CPU, memory usage, and number of requests are well below their respective limits? Am I missing something?
Additional Info:
Any help with this would be greatly appreciated!
One thought i have: If you're using multiple vCPUs: Is your code actually capable of utilizing all CPUs? For example, if you're setting CPU to 4, and your container is really only using 1 CPU, then you can see how even though the utilization looks low, the service actually isn't able to serve more requests concurrently. Some languages don't do a good job of utilizing multiple CPUs. If this is the case, try setting CPU to 1 and see if that helps - you would see more instances, but each would be cheaper.
Thanks! Thats a good suggestion, but unfortunately I'm only using 1 vCPU
Could it be I/O limits, then? Are you calling a downstream resource that doesn't scale beyond a certain amount? Or VPC connector with too small an instance size?
I am connecting to another cloud run service through websocket (the service this forum thread refers to essentially acts as a "passthrough" between the other service and a client), however it is only a single connection between the two services regardless of the number of concurrent requests.
I am not connecting the services together through a VPC, so I don't think thats the issue.
I will run some tests and check the behaviour of the service when it is not connected to the external resource; hopefully that will narrow it down.
Thanks!
Just to update; I have tried running without this connection and it still behaves in the same way.
I have also tried load testing locally (with 2000 clients) and the CPU usage of the process did not increase significantly.
Some other behaviour I've noticed: If I turn on manual scaling set to 1 container, I will eventually receive "492 No available instance" error. The log message is:
"The request was aborted because there was no available instance. Additional troubleshooting documentation can be found at: https://cloud.google.com/run/docs/troubleshooting#abort-request"
Again, during these error messages max concurrent connections is well below the limit, as well as CPU and Memory usage.
Is there a way to reset the Cloud Run scaling behaviour back to its initial settings? I wonder if cloud run is "remembering" to scale at certain times of the day based on previous load when it doesn't need to.
Thanks again!
I'm having exactly the same issue using FastAPI websockets. Getting "429 The request was aborted because there was no available instance".