Re: Help Needed: Missing _metadata_uuid and _metad...

sargis93 · 11-19-2024 05:19 PM

Hello everyone!

I could really use some assistance!

I've set up a pipeline to copy data from my managed SQL (PostgreSQL) on GCP to BigQuery. I followed these guides:

I successfully created a Dataflow template as outlined here: Datastream to BigQuery Template

The pipeline requires two BigQuery datasets: the main dataset and a staging dataset. After getting the pipeline running, I started seeing data being ingested into the staging dataset. The pipeline auto-created columns and added the _metadata_* fields, but it did not add the _metadata_uuid and _metadata_lsn fields, which is causing the merge step to fail.

I checked the data in the GCS bucket that holds the delta files, and I can see the LSN field present in the metadata JSON.

Does anyone know why the _metadata_uuid and _metadata_lsn fields were not auto-created in the BigQuery datasets? Any help would be greatly appreciated!

Thanks in advance!

Note: Note: I was able to get merge step to work by manually adding _metadata_uuid and _metadata_lsn but this feels wrong. Especially if I have dozens of tables

Actual Error message that the job produces (for every table):
Merge Job Failed With BigQuery Exception: com.google.cloud.bigquery.BigQueryException: Query error: Unrecognized name: _metadata_lsn; Did you mean _metadata_ssn? at [1:627] Statement: BEGIN BEGIN TRANSACTION; MERGE .....

The _metadata fields that were auto created are:

  "_metadata_timestamp",
  "_metadata_read_timestamp",
  "_metadata_read_method",
  "_metadata_source_type",
  "_metadata_deleted",
  "_metadata_change_type",
  "_metadata_row_id",
  "_metadata_scn", <---- THIS IS AN ORACLE FIELD?
  "_metadata_ssn",
  "_metadata_rs_id"

sargis93

Hi @ms4446 !

I am tagging you here because I saw similar posts where you helped! Would really appreciate any help/hints you might provide here 🙏

Zinia

Since you're using DataStream to replicate data from PostgreSQL to BigQuery, it's essential to ensure that your DataStream configuration is set up to correctly capture all relevant metadata fields and also try checking if the _metadata_uuid and _metadata_lsn fields are set in your DataStream destination configuration because DataStream automatically creates fields such as _metadat_timestamp or _metadata_read_timestamp and others, but the specific fields like the ones i mentioned may or may not be included.

Still if you verified your configurations and are still encountering issues, its best to reach out to Google Cloud Support.

sargis93

Thank you Zinia for response. But I think DataStream is set up correctly. I am looking at the JSON objects in the bucket and they include "LSN" which makes me think that dataflow itself isn't set up right

viniremigio

I'm facing the same issue. Any update?

Help Needed: Missing _metadata_uuid and _metadata_lsn in BigQuery Dataset for Datastream Pipeline