Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

datastream handle schema change

I have datastream set up to stream cloudsql postgres tables to BigQuery. It seems like DataStream can't handle the backward compatible schema changes like adding a new nullable column.

 

Is that expected?

1 1 387
1 REPLY 1

Datastream is designed to handle various types of schema changes automatically. However, adding a new nullable column in your Cloud SQL Postgres database may not always be immediately reflected in your BigQuery destination. 

Typically, Datastream detects new columns and adds them to the BigQuery schema for new rows, though this change may not apply to existing rows that were replicated before the schema change. When columns are deleted, Datastream generally ignores the deletion, resulting in those columns being populated with NULL values in the BigQuery table. Changes in data types are often managed smoothly, provided the new type is compatible with the existing data.

The issue of schema changes not being reflected might be due to several factors. There can be a propagation delay where the schema change takes time to appear in BigQuery. More complex changes, such as altering a column's nullability, might not be automatically handled by Datastream. Additionally, the specific configuration of your Datastream setup could influence how schema changes are managed.

To address this, you can take several approaches. Sometimes, simply waiting and monitoring the Datastream and BigQuery table will show that the new column appears after a delay. If the change does not propagate automatically, manually updating the BigQuery schema to include the new column is an option. For changes that need to reflect in existing rows, recreating the BigQuery table within the Datastream configuration might be necessary, though this process will create a data gap and require manual backfilling.

For further assistance, reaching out to Google Cloud Support can provide specific troubleshooting help and ensure your schema changes are correctly managed. Regularly monitoring Datastream logs for errors or warnings and keeping your Datastream and BigQuery versions up-to-date can also help maintain smooth schema change management.