Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Google Cloud's Datastream: Not replicating deletes to BigQuery

I have a stream setup with Datastream where a PostgreSQL database on Cloud SQL replicates to BigQuery. When data is inserted to the PostgreSQL database, replication to BigQuery has been working just fine.

I went to delete a row from the PostgreSQL database (after running: ALTER TABLE my_table REPLICA IDENTITY FULL; and it's not being replicated and returning an error:

event_code: "UNSUPPORTED_EVENTS_DISCARDED"
message: "Discarded 1 unsupported events for BigQuery destination: project_id.dataset.my_table, with reason code: BIGQUERY_UNSUPPORTED_PRIMARY_KEY_CHANGE, details: Failed to write to BigQuery due to an unsupported primary key change: adding primary keys to existing tables is not supported.."
object_name: "my_table"

So now, I have 179 rows in the PostgreSQL database and 180 in BigQuery because it's not replicating the delete. Not sure what is going on since I did not change anything with Primary Keys and I'm not adding any Primary Keys...

Solved Solved
0 4 3,047
1 ACCEPTED SOLUTION

The error message indicates that Datastream encountered an issue with replicating the DELETE operation to BigQuery due to an unsupported primary key change. While BigQuery does not enforce primary key constraints like traditional databases, Datastream uses a primary key or unique identifier to track changes between the source and destination.

The ALTER TABLE statement you executed in PostgreSQL set the REPLICA IDENTITY to FULL, ensuring that the entire row is captured in change events. However, this does not address the primary key discrepancy in BigQuery.

To resolve this issue:

  1. Ensure that the PostgreSQL table has a primary key defined, and that this primary key or unique identifier is also present in the BigQuery table.
  2. Review your Datastream connection settings to ensure that it's configured to handle deletes correctly.
  3. If there are discrepancies in data, consider manually syncing the rows between PostgreSQL and BigQuery to ensure consistency.

It's essential to ensure that the source and destination schemas are in sync and that any changes to the source schema are appropriately reflected in the destination.

View solution in original post

4 REPLIES 4