Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Changing primary key settings in Postgres isn't synced across to BigQuery

I have a Datastream set up to sync data from Postgres to BigQuery.

Originally, the tables in Postgres didn't have primary keys so the tables in BigQuery also didn't have any primary keys. 

I have changed the tables in Postgres to have primary keys and then deleting and manually backload the data in BigQuery but the new tables in BigQuery still don't have the primary keys that I have set up in Postgres.

What should I do to get the primary keys synced from Postgres to BigQuery? Set up a new Datastream?

 

0 1 174
1 REPLY 1

To sync primary keys from Postgres to BigQuery using Datastream, it's important to understand the limitations and capabilities of the tools involved. Datastream can handle basic schema evolution, such as adding or removing columns, but it does not automatically propagate primary key constraints or other advanced schema changes to BigQuery. Moreover, BigQuery itself does not enforce primary keys in the same manner as relational databases like Postgres.

Given these limitations, to reflect primary keys in BigQuery, you have two main options. The first is to manually modify the schema of your existing BigQuery tables. This could involve adding unique indexes or using SQL to enforce uniqueness, though these measures will not create true primary keys. Alternatively, you might consider creating new BigQuery tables that better align with the updated schema, including fields that represent primary keys. In this case, you would need to migrate your existing data to these new tables and adjust your Datastream configuration or ETL process accordingly.

In some cases, the most reliable approach might be to recreate the Datastream entirely. By setting up a new Datastream, you ensure that the service recognizes the updated schema, including primary keys, from the outset. If you choose this route, it's essential to backfill your data into BigQuery to maintain consistency.

It's also important to ensure you are using the latest version of Datastream, as earlier versions may have limited support for schema changes. Additionally, testing these changes in a staging environment before applying them in production is critical to avoid unexpected issues and minimize downtime.

In summary, Datastream does not automatically sync primary keys to BigQuery, and due to BigQuery's limitations, manual intervention is often necessary. Whether you choose to manually adjust the schema, create new tables, or recreate the Datastream, careful planning and monitoring are key to ensuring a successful update.