Hi! I am trying to connect to my Cloud SQL PostgreSQL database from Datastream (to get data to BigQuery). I am using public IP, but since it's set up to require trusted client certificates, my understanding is that I need to use Forward-SSH tunnel as connectivity method from Datastream (please correct me if I'm wrong on this).
According to this guide, it should be possible to use the database server to terminate the SSH tunnel. It does not specify in detail how to do it, only to "allow network traffic to reach the tunnel server or database host via SSH, which is generally on TCP port 22".
I am not able to connect, and getting the error
"We can't connect to the data source. Verify that the Forward SSH tunnel configuration is configured correctly in the connection profile and that the port is open on the SSH tunnel server."
I created a connectivity test on the database confirming that port 22 is not reachable.
How can I get this to work? Is there any setting up that needs to be done on the database side? How do I allow traffic on port 22? Or should I use a different port?
Appreciate any help on this!
To connect to your Cloud SQL PostgreSQL database from Datastream (to get data to BigQuery) using a Forward-SSH tunnel as the connectivity method, you will need to:
Create an SSH Tunnel Server:
Configure the SSH Tunnel Server to Allow Traffic on Port 22:
/etc/ssh/sshd_config
) and ensure it has the line Port 22
. Restart the SSH server afterward.Create a Datastream Connection Profile for the PostgreSQL Database:
Create a Datastream Stream and Start Replication:
Troubleshooting:
ufw
or firewalld
on Linux)./var/log/auth.log
on Linux, for any SSH connection attempts and potential issues.By following these steps, you should be able to establish a successful SSH tunnel from Datastream to your Cloud SQL PostgreSQL database.
I understand that as a no to whether it is possible to use the database server (cloud SQL) as SSH termination.
Turns out it is not possible for Datastream to connect to the database via public IP at all, even through SSH, as long as it requires trusted client certificates.
For the record, I ended up going with federated queries from Big Query instead of Datastream, since my only remaining option was to use a reverse proxy through private IP / VPC, which was too much overhead for being worth it.
I understand the challenges you've faced in trying to establish a secure and reliable connection to your Cloud SQL PostgreSQL database using Datastream. It's indeed true that Cloud SQL instances do not support direct SSH access for terminating SSH tunnels, and the requirement for trusted client certificates adds a layer of security that complicates the use of public IP connections in this context.
Federated queries in BigQuery offer a good alternative for accessing external data sources like Cloud SQL. They allow you to run queries directly on external databases without moving data into BigQuery, which can be a more straightforward solution when dealing with secure connections that require client certificates.
Using federated queries can simplify the architecture by removing the need for an SSH tunnel and a reverse proxy setup, which would indeed add complexity and overhead. It's a solution that leverages BigQuery's built-in capabilities to query external data sources, thus streamlining the process.
User | Count |
---|---|
5 | |
2 | |
1 | |
1 | |
1 |