Hi! I'm running a websocket server in cloud run. The settings I currently have are:
During peak hours, the metrics for this service are:
Why is cloud run scaling the service so heavily, when my CPU, memory usage, and number of requests are well below their respective limits? Am I missing something?
Additional Info:
Any help with this would be greatly appreciated!
One thought i have: If you're using multiple vCPUs: Is your code actually capable of utilizing all CPUs? For example, if you're setting CPU to 4, and your container is really only using 1 CPU, then you can see how even though the utilization looks low, the service actually isn't able to serve more requests concurrently. Some languages don't do a good job of utilizing multiple CPUs. If this is the case, try setting CPU to 1 and see if that helps - you would see more instances, but each would be cheaper.
Thanks! Thats a good suggestion, but unfortunately I'm only using 1 vCPU
Could it be I/O limits, then? Are you calling a downstream resource that doesn't scale beyond a certain amount? Or VPC connector with too small an instance size?
I am connecting to another cloud run service through websocket (the service this forum thread refers to essentially acts as a "passthrough" between the other service and a client), however it is only a single connection between the two services regardless of the number of concurrent requests.
I am not connecting the services together through a VPC, so I don't think thats the issue.
I will run some tests and check the behaviour of the service when it is not connected to the external resource; hopefully that will narrow it down.
Thanks!
Just to update; I have tried running without this connection and it still behaves in the same way.
I have also tried load testing locally (with 2000 clients) and the CPU usage of the process did not increase significantly.
Some other behaviour I've noticed: If I turn on manual scaling set to 1 container, I will eventually receive "492 No available instance" error. The log message is:
"The request was aborted because there was no available instance. Additional troubleshooting documentation can be found at: https://cloud.google.com/run/docs/troubleshooting#abort-request"
Again, during these error messages max concurrent connections is well below the limit, as well as CPU and Memory usage.
Is there a way to reset the Cloud Run scaling behaviour back to its initial settings? I wonder if cloud run is "remembering" to scale at certain times of the day based on previous load when it doesn't need to.
Thanks again!