I have a stream setup with Datastream where a PostgreSQL database on Cloud SQL replicates to BigQuery. When data is inserted to the PostgreSQL database, replication to BigQuery has been working just fine.
I went to delete a row from the PostgreSQL database (after running: ALTER TABLE my_table REPLICA IDENTITY FULL; and it's not being replicated and returning an error:
event_code: "UNSUPPORTED_EVENTS_DISCARDED" message: "Discarded 1 unsupported events for BigQuery destination: project_id.dataset.my_table, with reason code: BIGQUERY_UNSUPPORTED_PRIMARY_KEY_CHANGE, details: Failed to write to BigQuery due to an unsupported primary key change: adding primary keys to existing tables is not supported.." object_name: "my_table"
So now, I have 179 rows in the PostgreSQL database and 180 in BigQuery because it's not replicating the delete. Not sure what is going on since I did not change anything with Primary Keys and I'm not adding any Primary Keys...
Solved! Go to Solution.
The error message indicates that Datastream encountered an issue with replicating the DELETE operation to BigQuery due to an unsupported primary key change. While BigQuery does not enforce primary key constraints like traditional databases, Datastream uses a primary key or unique identifier to track changes between the source and destination.
The ALTER TABLE
statement you executed in PostgreSQL set the REPLICA IDENTITY
to FULL
, ensuring that the entire row is captured in change events. However, this does not address the primary key discrepancy in BigQuery.
To resolve this issue:
It's essential to ensure that the source and destination schemas are in sync and that any changes to the source schema are appropriately reflected in the destination.
The error message indicates that Datastream encountered an issue with replicating the DELETE operation to BigQuery due to an unsupported primary key change. While BigQuery does not enforce primary key constraints like traditional databases, Datastream uses a primary key or unique identifier to track changes between the source and destination.
The ALTER TABLE
statement you executed in PostgreSQL set the REPLICA IDENTITY
to FULL
, ensuring that the entire row is captured in change events. However, this does not address the primary key discrepancy in BigQuery.
To resolve this issue:
It's essential to ensure that the source and destination schemas are in sync and that any changes to the source schema are appropriately reflected in the destination.
Thank you for the response. I created a Primary Key in the PostgreSQL table but when I start the stream, I am getting a new error now when starting the stream's backfill:
Datastream can't create a table in BigQuery, because the table's primary key has a column with an unsupported data type.
I created the Primary Key in PostgreSQL with:
ALTER TABLE stocks ADD COLUMN id SERIAL PRIMARY KEY
Is this not the correct way to create a primary key for BigQuery?
The SERIAL data type in PostgreSQL is not directly supported as a primary key data type for replication to BigQuery using Datastream. Instead, you can use the BIGINT data type in PostgreSQL, which corresponds to INT64 in BigQuery.
To create a primary key with the BIGINT data type in PostgreSQL, you can use the following ALTER TABLE statement:
ALTER TABLE stocks ADD COLUMN id BIGINT PRIMARY KEY;
Thank you for your help, looks like all is working well now with another attempted delete. Going forward, I think I will set the replica identity to default:
ALTER TABLE stocks REPLICA IDENTITY DEFAULT;