Hi everyone,
I have a Python App (Superset) running on Cloud Run, using a Postgres DB running on a GCE VM.
The connection works fine on a small scale, but as soon as more users start using the application, we start experiencing connection timeouts between the application (Cloud Run) and the database (VM).
Here's a detailed description of my setup and the problem:
Setup:
Problem:
Database connections work normally with a low number of Cloud Run containers. However, as I scale up the number of Cloud Run instances (increasing the number of simultaneous connection attempts), I start experiencing connection timeouts. This occurs even though:
SQLAlchemy Configuration:
My SQLAlchemy engine options are configured as follows:
I've already checked VPC firewall rules to ensure that inbound traffic on port 5432 is allowed from the Cloud Run service's IP range to my VM. I'm using network-tags to allow traffic.
What I've Tried:
I'm looking for suggestions on how to further diagnose this problem and potential solutions. Any help or advice would be greatly appreciated. Please let me know if any other information would be helpful.
Thanks in advance!
Solved! Go to Solution.
Hi @knet , thanks for your help! I did check the database settings, logs, etc, and it didn't seem to be the source of the problem.
However, I resolved the problem by stopping using network tags in the Firewall rule to allow Cloud Run to connect with the VM via Direct VPC. I started using the subnetwork's CIDR IP range instead, which did the trick. I found that solution based on these posts:
https://www.googlecloudcommunity.com/gc/Infrastructure-Compute-Storage/DIRECT-VPC-for-cloud-run/m-p/...
https://stackoverflow.com/questions/79086615/cloud-run-direct-vpc-egress-connection-timeout-issue
I've heard that many databases don't work well when a large number of clients connect to them. I would suggest looking at your database's documentation to see if there's anything they say on this topic.
If the issue is too many Cloud Run instances, you might be able to run fewer, larger Cloud Run instances (more CPU/memory).
If the issue is too many IP addresses, you could try using VPC Connectors instead of Direct VPC; this would reduce the number of IPs connecting to the database.
If the issue is the number of connections/concurrent requests, you could try reducing the concurrency of your service, and running a larger number of smaller instances, each of which only processes a small number of requests.
Sorry I don't have more concrete advice.
Hi @knet , thanks for your help! I did check the database settings, logs, etc, and it didn't seem to be the source of the problem.
However, I resolved the problem by stopping using network tags in the Firewall rule to allow Cloud Run to connect with the VM via Direct VPC. I started using the subnetwork's CIDR IP range instead, which did the trick. I found that solution based on these posts:
https://www.googlecloudcommunity.com/gc/Infrastructure-Compute-Storage/DIRECT-VPC-for-cloud-run/m-p/...
https://stackoverflow.com/questions/79086615/cloud-run-direct-vpc-egress-connection-timeout-issue
Thank you very much for sharing!! I had the same struggle in accessing a NFS server from Cloud Run to a VM (posted here), adding a firewall rule with the subnet instead of the network tag seem to have solved! I implemented the change yesterday, and in 24h I had no more mount time out issues!