Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Connect external Postgres to Big Query

Hi all,

We have Postgres database ( Amazon hosted) and there is one view ( 10 colums) I need to transfer to Big Query on daily basis. What would be the best solution for this? It must be automatic. 

Can I create ODBC connection and somehow link the servers? 

Im very confused from GCP, so mamy services and functions and I couldnt find any simple solution for my issue.

Thank you in advance for your answers!

1 5 1,605
5 REPLIES 5

Unfortunately, direct ODBC connection between your PostgreSQL instance (hosted on AWS) and BigQuery isn't currently supported. However, there are several effective approaches:

Datastream: This is ideal for near real-time replication of entire databases, but might be a bit heavy-duty for just a single view.

Custom ETL Pipelines: This gives you complete control using tools like Cloud Composer (Apache Airflow) or Dataflow (Apache Beam). However, it requires more technical setup and maintenance.

Third-Party Tools: Services like Fivetran or Stitch specialize in data integration and could simplify the process, but often come with a cost.

My Recommendation: Dataflow + Cloud Scheduler + Pub/Sub (Google Cloud)

This approach is well-suited for your needs, as it leverages the power and flexibility of Google Cloud Platform:

  • Dataflow: A fully managed service for creating data processing pipelines.
  • Cloud Scheduler: Allows you to schedule jobs, like triggering your data transfer daily.
  • Pub/Sub: A messaging service that helps coordinate the different parts of your solution.

How It Works

  1. Cloud Scheduler triggers a daily job.
  2. The job sends a message to Pub/Sub.
  3. Pub/Sub activates your Dataflow pipeline.
  4. Dataflow securely extracts data from your PostgreSQL view.
  5. Dataflow loads the data into your BigQuery table.

thank you for your answer!

One question: are you creating the view just for BigQuery or is it used for something else?
 
The view is really just a query of your underlying tables. Most people extract those tables first, and then either transform in transit (ETL) or in BigQuery (ELT). Transforming in the source database (TEL) is less common.
 
So my recommendation would be:
  1. Use change data capture (CDC) to extract the source tables. This is the fastest and lowest load way to get data from Postgres. It will also get you all the deletes.
  2. If you can't use CDC then batch-extract the source, or possibly the view. But you'll have more latency, more of a load on Postgres if the view is bigger (and only for BigQuery), and you'll also need to manage the deletes.
You miss deletes with batch extractions because whenever a delete happens in Postgres, the next extraction doesn't see it. The only way to know it happened with batch is to diff the old and new extracts to look for missing rows. CDC gives you all the change data.
 
Some ETL/ELT tools will do the deletions for you in BigQuery as part of CDC. In Estuary Flow we call this a standard materialization.
 
Good luck!

Hi @lucierabochova if you need to automate the daily transfer of a PostgreSQL (Amazon RDS) view to BigQuery, here are a few options to consider:

Using Dataflow (Managed by Google Cloud)

  • You can configure Dataflow (Apache Beam) to extract data from PostgreSQL and load it into BigQuery.
  • While this requires some setup, it’s a fully managed service within GCP.

Using Cloud SQL + Scheduled Queries

  • A simpler alternative is to use Cloud SQL for PostgreSQL and create an external table in BigQuery.
  • You can then schedule a query in BigQuery to pull the required data daily.

Using a Third-Party Connector (Faster & More Flexible)

  • Windsor.ai offers an automated PostgreSQL to BigQuery connector, allowing you to schedule data transfers without managing infrastructure.
  • This is a great option if you're looking for a no-code solution that eliminates the need for complex GCP configurations.

Since you're finding GCP a bit overwhelming, you might want to start with a third-party tool like Windsor.ai for a quick setup. Once you're comfortable, you can explore GCP-native options like Dataflow or Cloud SQL for a more customized approach.

Hope this helps! 

You can use Google Cloud Dataflow or Cloud Functions with a scheduled trigger to automate the transfer. Another simple option is BigQuery Data Transfer Service with AWS S3 as an intermediary (export Postgres data to S3, then load it into BigQuery). Direct ODBC linking is not natively supported for BigQuery.