Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

BigQuery CDC upserts not applied

We use Datastream to CDC our Postgres database into Bigquery CDC tables. For several months this has been running fine with a max staleness option of 15 minutes. However since a few days, we are facing a problem where BigQuery complains that upserts are not applied in the max staleness period. This happens when we try to write tables through Dataproc based on these BQ tables. This causes writes to other tables based on these tables to fail.

This happened for multiple days in a row now at different times of the day. The interesting point is, that for those tables there were no upsert events hence nothing to apply. Also for these tables the streaming buffer is empty. 

This only gets resolved if:
* There's a new upsert being done on the table. This seems to "unstuck" the table. 
* Manually force a backfill on the table (definitely not ideal).
* Remove all datasources and restart Datastream (even less ideal and problem returns).

Affected services: BigQuery, Datastream
Source data: Cloud SQL Postgres
Destination: BigQuery
Region: europe-west1

Is anyone else facing these problems? If more details are required, please let me know.

1 1 710
1 REPLY 1

Yes, there have been other reports of BigQuery CDC upserts not being applied in the max staleness period. This seems to be a relatively new issue, and Google Cloud is investigating it.

In the meantime, there are a few things you can try to work around the issue:

  • Increase the max staleness period. This will give BigQuery more time to apply the upserts. However, it will also make your data more stale.
  • Force a backfill on the table. This will force BigQuery to reapply all of the upserts for the table. However, this can be a time-consuming operation, especially for large tables.
  • Use a different CDC solution. There are a number of third-party CDC solutions that work with BigQuery. You may want to try using one of these solutions instead of Datastream.

If you are still having problems, you can file a report in this link.