How to use our current Python ETL pipelines to populate and keep BigQuery tables in sync?
- We have some ETL pipelines in Python that we use to populate a db (PostgresSQL) that we use for our reports (currently using Looker Studio).
- We want to direct some of these towards BigQuery to evaluate the possible replacement of Postgres with BigQuery.
- We do UPSERT operations every hour and every day we resynchronize the last day.
Furthermore, it may happen that we have to resynchronize the database from the beginning of the year. - Writing direct BigQuery upsert queries would be particularly expensive (every time the entire table size).
- We realized that Analytics historicizes by creating a subtable for each day.
- Can we use the same approach to connect our current ETL pipelines and update only 1 day?