Hello everyone!
I could really use some assistance!
I've set up a pipeline to copy data from my managed SQL (PostgreSQL) on GCP to BigQuery. I followed these guides:
I successfully created a Dataflow template as outlined here: Datastream to BigQuery Template
The pipeline requires two BigQuery datasets: the main dataset and a staging dataset. After getting the pipeline running, I started seeing data being ingested into the staging dataset. The pipeline auto-created columns and added the _metadata_* fields, but it did not add the _metadata_uuid and _metadata_lsn fields, which is causing the merge step to fail.
I checked the data in the GCS bucket that holds the delta files, and I can see the LSN field present in the metadata JSON.
Does anyone know why the _metadata_uuid and _metadata_lsn fields were not auto-created in the BigQuery datasets? Any help would be greatly appreciated!
Thanks in advance!
Note: Note: I was able to get merge step to work by manually adding _metadata_uuid and _metadata_lsn but this feels wrong. Especially if I have dozens of tables
Actual Error message that the job produces (for every table):
Merge Job Failed With BigQuery Exception: com.google.cloud.bigquery.BigQueryException: Query error: Unrecognized name: _metadata_lsn; Did you mean _metadata_ssn? at [1:627] Statement: BEGIN BEGIN TRANSACTION; MERGE .....
The _metadata fields that were auto created are:
"_metadata_timestamp", "_metadata_read_timestamp", "_metadata_read_method", "_metadata_source_type", "_metadata_deleted", "_metadata_change_type", "_metadata_row_id", "_metadata_scn", <---- THIS IS AN ORACLE FIELD? "_metadata_ssn", "_metadata_rs_id"
Hi @ms4446 !
I am tagging you here because I saw similar posts where you helped! Would really appreciate any help/hints you might provide here 🙏
Since you're using DataStream to replicate data from PostgreSQL to BigQuery, it's essential to ensure that your DataStream configuration is set up to correctly capture all relevant metadata fields and also try checking if the _metadata_uuid and _metadata_lsn fields are set in your DataStream destination configuration because DataStream automatically creates fields such as _metadat_timestamp or _metadata_read_timestamp and others, but the specific fields like the ones i mentioned may or may not be included.
Still if you verified your configurations and are still encountering issues, its best to reach out to Google Cloud Support.
Thank you Zinia for response. But I think DataStream is set up correctly. I am looking at the JSON objects in the bucket and they include "LSN" which makes me think that dataflow itself isn't set up right
I'm facing the same issue. Any update?