Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

How to automate data refresh cycles (ETL from 3rd party SaaS to BQ) to keep BQ tables in sync

We have a lightweight ETL  pipeline setup using CloudRun that pulls data from a 3rd party SaaS software  via APIs and inserts them  into BQ  tables using GCS as the temporary storage. We are facing a challenge in updating BQ tables daily via costly BQ inserts. Is there a better way to automate data refresh cycles meaning any data that changes in the 3rd party software needs to be updated in BQ to keep BQ tables in sync (upsert operation)? This operation needs  to be run at a a tunable frequency (once per day, every 3 hours, once a week etc) via a cloud scheduler job.

Solved Solved
1 2 190
1 ACCEPTED SOLUTION

Hi @dheerajpanyam,

Welcome back to Google Cloud Community!

Some suggestions to improve your data refresh cycles while keeping your BQ tables in sync with the 3rd party SaaS:

  • Upsert operations: Update only changed data with MERGE statements.
  • Partitioning and materialized views: Improve query performance and reduce costs.
  • Cloud Scheduler: Trigger ETL pipeline at desired intervals.
  • API polling or CDC: Choose efficient approach based on SaaS capabilities.
  • Error handling, monitoring, data validation, lifecycle management: Ensure data quality and efficiency.
  • Consider tools like Cloud Composer, Dataflow, Pub/Sub for complex needs.

Remember: Choose the solution that best fits your specific needs and SaaS capabilities.

View solution in original post

2 REPLIES 2

Hi @dheerajpanyam,

Welcome back to Google Cloud Community!

Some suggestions to improve your data refresh cycles while keeping your BQ tables in sync with the 3rd party SaaS:

  • Upsert operations: Update only changed data with MERGE statements.
  • Partitioning and materialized views: Improve query performance and reduce costs.
  • Cloud Scheduler: Trigger ETL pipeline at desired intervals.
  • API polling or CDC: Choose efficient approach based on SaaS capabilities.
  • Error handling, monitoring, data validation, lifecycle management: Ensure data quality and efficiency.
  • Consider tools like Cloud Composer, Dataflow, Pub/Sub for complex needs.

Remember: Choose the solution that best fits your specific needs and SaaS capabilities.

Thanks so much @christianpaula 🙏