Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Datastream: Column exclusion does not work with Backfilling


Hello,

I have excluded certain columns from certain tables from replication from Postgres to BigQuery.

However, this setting does not seem to be persisted, and on pressing "Backfill", I can see in the logs an ALTER command to add the missing columns to BigQuery.

As a result, data, which I don't want to be replicated, ends up being replicated.

In addition, going back to the settings screen, on the option
`Edit source configuration` -> `Select objects to exclude` - this does not get persisted once I save it, and then go back to edit it.

I also tried assigning COLUMN level privileges to the replication user in Postgres for the excluded columns, but then I get permission errors!

 

The option

To exclude some objects from automatic backfill, define them below.

does not seem to persist columns as well.

 

I think this might be more of a UI problem, but not sure. Hope you're able to identify the problem.

Thank you!
Martin

0 1 288
1 REPLY 1

Hi @mtomov,

Welcome to the Google Cloud Community!

I understand that you want to exclude specific columns when you replicate your PostgreSQL database to BigQuery. However, the column data is still replicated on BigQuery when initiating a Backfill despite being excluded.

Here are a few possible reasons why this could have happened:

  • Known limitations when using PostgreSQL as a database source. Datastream has known limitations when working with PostgreSQL columns.
  • Insufficient IAM permissions. Confirm if your Google Cloud account has the Datastream Admin role for the required permissions to make changes to your Datastream source.
  • Backfill historical data is unchecked. Confirm if the Backfill historical data option is checked. If enabled, Datastream will stream all existing data, in addition to changes to the objects (columns), from the source into the destination.

Troubleshooting steps and workarounds you can try:

  • Manually initiate a backfill. If the Backfill historical data is unchecked, you can manually initiate a backfill to update the destination with your changes to objects.
  • Include specific schemas and tables instead of excluding. If your use case supports it or if your database is not that large, consider including specific columns or tables instead of excluding columns when replicating your database to BigQuery.

Additional documentation:

  • Refer to our Datastream best practices. Check out these best practices for more info on how to optimize your Datastream workflows.
  • File a feature request on our public issue tracker. If you have an issue or suggestion, our issue tracker is available where users can publicly share their ideas for Datastream. Do note that there isn’t a specific timeframe for issue resolution. Instead, we wait for a feature to have a handful of stars and, hopefully, comments from several users about how the feature would be useful. 

I hope this helps!