Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

How to use our current Python ETL pipelines to populate and keep BigQuery tables in sync?

How to use our current Python ETL pipelines to populate and keep BigQuery tables in sync?

  • We have some ETL pipelines in Python that we use to populate a db (PostgresSQL) that we use for our reports (currently using Looker Studio).
  • We want to direct some of these towards BigQuery to evaluate the possible replacement of Postgres with BigQuery.
  • We do UPSERT operations every hour and every day we resynchronize the last day.
    Furthermore, it may happen that we have to resynchronize the database from the beginning of the year.
  • Writing direct BigQuery upsert queries would be particularly expensive (every time the entire table size).
  • We realized that Analytics historicizes by creating a subtable for each day.
  • Can we use the same approach to connect our current ETL pipelines and update only 1 day?
0 1 365
1 REPLY 1