Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

How to load data from external sources into BigQuery?

Hi,

I'm working on a project where I need to load data into BigQuery from multiple cloud and on-prem sources (like SQL Server, PostgreSQL, and some SaaS apps). I’d like to automate this as much as possible, ideally without writing a lot of custom code or managing pipelines manually.

Are there any tools or workflows you’d recommend for this kind of setup? Open to both native GCP solutions and third-party options. Thanks in advance!

Solved Solved
1 4 198
2 ACCEPTED SOLUTIONS

there are plenty of tools to do this with minimal code or ops headaches.

 Native GCP Options :

  • BigQuery Data Transfer Service (DTS) – Great for native Google connectors (Ads, Analytics, YouTube). Useless for external DBs.

  • Cloud Data Fusion – Drag-and-drop UI, supports on-prem and SaaS, including SQL Server and PostgreSQL. Runs on managed Dataproc (Spark under the hood). Bit heavy but solid for complex flows. ( it Can get expensive at scale if not monitored well.)

  • Cloud Composer (Airflow) – Overkill unless you’re doing multi-step workflows or already familiar with Airflow.

  • Cloud Functions + Scheduler – Good for lightweight stuff. Think polling APIs, writing to GCS, triggering BigQuery jobs, Still requires code not ideal for scaling many sources.

 Third-Party ELT Tools :

  • Fivetran – The gold standard. Plug and play for almost everything: PostgreSQL, SQL Server (CDC supported), Salesforce, HubSpot, etc. Managed, reliable, but pricing can sting for high volumes.

  • Hevo Data – Similar to Fivetran, easier on the budget, Great UI, handles real-time sync well.

  • Stitch Data – Basic, affordable, does the job

what I’d do and suggest you the same -

  1. Use Fivetran/Hevo for all major sources (SQL Server, PostgreSQL, SaaS).

  2. For weird or niche sources Cloud Functions or Data Fusion.

  3. Do transformations using dbt Cloud or BQ scheduled queries.

  4. Set up Cloud Monitoring for alerts/logging.

 

View solution in original post

I've worked on a similar setup recently and found a few approaches that helped.

For native GCP, Dataflow with templates and BigQuery Data Transfer Service can be solid options, especially if you're okay with some configuration and scripting. But if you're looking for something more automated and less hands-on, I'd also look into no-code integration platforms. Generally, they support a wide range of data sources (cloud apps, on-prem DBs) and let you set up scheduled or continuous data loads into BigQuery without coding. This guide on loading data into BigQuery walks through the process step-by-step.

View solution in original post

4 REPLIES 4

there are plenty of tools to do this with minimal code or ops headaches.

 Native GCP Options :

  • BigQuery Data Transfer Service (DTS) – Great for native Google connectors (Ads, Analytics, YouTube). Useless for external DBs.

  • Cloud Data Fusion – Drag-and-drop UI, supports on-prem and SaaS, including SQL Server and PostgreSQL. Runs on managed Dataproc (Spark under the hood). Bit heavy but solid for complex flows. ( it Can get expensive at scale if not monitored well.)

  • Cloud Composer (Airflow) – Overkill unless you’re doing multi-step workflows or already familiar with Airflow.

  • Cloud Functions + Scheduler – Good for lightweight stuff. Think polling APIs, writing to GCS, triggering BigQuery jobs, Still requires code not ideal for scaling many sources.

 Third-Party ELT Tools :

  • Fivetran – The gold standard. Plug and play for almost everything: PostgreSQL, SQL Server (CDC supported), Salesforce, HubSpot, etc. Managed, reliable, but pricing can sting for high volumes.

  • Hevo Data – Similar to Fivetran, easier on the budget, Great UI, handles real-time sync well.

  • Stitch Data – Basic, affordable, does the job

what I’d do and suggest you the same -

  1. Use Fivetran/Hevo for all major sources (SQL Server, PostgreSQL, SaaS).

  2. For weird or niche sources Cloud Functions or Data Fusion.

  3. Do transformations using dbt Cloud or BQ scheduled queries.

  4. Set up Cloud Monitoring for alerts/logging.

 

I've worked on a similar setup recently and found a few approaches that helped.

For native GCP, Dataflow with templates and BigQuery Data Transfer Service can be solid options, especially if you're okay with some configuration and scripting. But if you're looking for something more automated and less hands-on, I'd also look into no-code integration platforms. Generally, they support a wide range of data sources (cloud apps, on-prem DBs) and let you set up scheduled or continuous data loads into BigQuery without coding. This guide on loading data into BigQuery walks through the process step-by-step.

Thank you for your responses! I will check it and let you know. 

You can use Integration Connector to create connection with 3rd Party. Use these connections in Application Integration