We use Datastream to CDC our Postgres database into Bigquery CDC tables. For several months this has been running fine with a max staleness option of 15 minutes. However since a few days, we are facing a problem where BigQuery complains that upserts are not applied in the max staleness period. This happens when we try to write tables through Dataproc based on these BQ tables. This causes writes to other tables based on these tables to fail.
This happened for multiple days in a row now at different times of the day. The interesting point is, that for those tables there were no upsert events hence nothing to apply. Also for these tables the streaming buffer is empty.
This only gets resolved if:
* There's a new upsert being done on the table. This seems to "unstuck" the table.
* Manually force a backfill on the table (definitely not ideal).
* Remove all datasources and restart Datastream (even less ideal and problem returns).
Affected services: BigQuery, Datastream
Source data: Cloud SQL Postgres
Destination: BigQuery
Region: europe-west1
Is anyone else facing these problems? If more details are required, please let me know.
Yes, there have been other reports of BigQuery CDC upserts not being applied in the max staleness period. This seems to be a relatively new issue, and Google Cloud is investigating it.
In the meantime, there are a few things you can try to work around the issue:
If you are still having problems, you can file a report in this link.