Hi everyone,
We are running a container on Cloud Run that just listens for events over a websocket connection continuously and processes them.
It has just one http endpoint just responding `200 OK` if the websocket connection is OK, just as a healthcheck endpoint basically.
Since it needs to run continuously and doesn't handle any requests other than the health check, we have the Cloud Run configured to Instance based billing and scaling with both min and max on 1.
Both a startup probe as liveness probe are setup as well.
We do not see any error or shutdown logs, also no disconnection logs from the websocket connection, but after a couple of days it just stops processing events from the websocket and we have to do a redeployment. It works again after the redeployment.
We have added logs to make sure the problem is not the websocket connection being closed, we also tried adding an automatic disconnect and reconnect to the websocket in the code every x time to see if it would keep listening always like that, but the same problem occured, it stopped processing/receiving events after a couple of days and requires a redeployment.
That's why we think it might have to do with Cloud Run or how we are using it. Maybe it isn't the best for long-running services like a websocket listener like this. So we are looking for any solutions to solve the problem and suggestions on how to do this better/ with a more suitable service.
Happy to share more info as well if needed. Thanks a lot!
Hi @paywithflash,
Welcome to Google Cloud Community!
It seems like the issue you’re facing is related to the load balancer timeout. By default, this load balancer has an idle timeout of 30 seconds for connections to its backend services, including your Cloud Run instance. Even though your WebSocket connection might remain technically "open" at a lower level, if there are periods where no data is being actively sent or received for longer than this 30-second idle timeout, the load balancer might proactively close the connection to conserve resources. Your application might not receive a clear notification of this closure, leading to the observed state where it stops processing events until a redeployment restarts the connection. You may try these workarounds to help you resolve the problem:
If the workarounds above doesn't work, you can contact Google Cloud Support for a more in-depth analysis. When contacting them, please provide comprehensive details and include screenshots. This will help them better understand and address your issue.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.