Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Datastream Issues

We are encountering a problem with the synchronization of data from MySQL to BigQuery through Datastream. An error has been detected, and we are uncertain about the appropriate resolution. The metadata issue remains unresolved.Error:
Discarded 2554 unsupported events for BigQuery destination: 437839915303.dataset.table, with reason code: BIGQUERY_APPEND_FAILED, details: Datastream is unable to write the event to table 'table' due to an error from BigQuery. Original error: 'Field root.datastream_metadata failed to convert to STRUCT. Error: The map of row index to error message is {0=The source object has fields unknown to BigQuery: root.datastream_metadata.is_deleted.}'

Solved Solved
1 3 924
1 ACCEPTED SOLUTION

Datastream appends metadata, such as root.datastream_metadata.is_deleted, to records it sends to BigQuery. This metadata is essential for tracking changes and replication status. The issue arises because BigQuery cannot interpret this specific metadata field as a valid STRUCT, indicating a schema mismatch. The error specifically points to the is_deleted field within datastream_metadata, which BigQuery does not expect or is not correctly defined in the table schema. As a result of this schema incompatibility, Datastream is unable to write these events into the BigQuery destination table, leading to the discarding of affected events. This can result in data synchronization issues or incomplete data replication.

Possible Causes & Resolutions

  • Recent Datastream Changes: If there have been recent updates to your Datastream configuration or the structure of the source database, these could introduce fields that are incompatible with the existing BigQuery table schema. Solution: Review and adjust your BigQuery table schema or Datastream configuration to ensure compatibility with new metadata fields.

  • Evolving Schema with Backfill: If a backfill operation was performed after modifying the stream's source or schema, it might introduce historical records with the problematic metadata field, even if your current configuration is correct. Solution: Consider re-creating the BigQuery table with a schema that matches your current data stream's structure, or adjust the stream to temporarily exclude incompatible fields until the BigQuery schema can be updated.

  • Complex Data Types: While the specific error message does not directly relate to complex data types but rather to the metadata structure, ensuring that data types in MySQL are directly compatible with BigQuery's supported types is crucial. Solution: Simplify data types in MySQL where possible, or apply transformations within Datastream to ensure compatibility with BigQuery.

Troubleshooting Steps

  • Examine Source Data: Investigate your MySQL database to understand why the root.datastream_metadata.is_deleted field is included, particularly in the context of how deletions are managed in your source data.

  • Schema Verification: Carefully compare the schema of your source MySQL table with that of your destination BigQuery table. Ensure that all fields, especially metadata fields added by Datastream, are accounted for and compatible in the BigQuery schema.

  • Datastream Logs: Review Datastream logs for additional error messages or insights that could shed light on the root cause of the issue.

 

View solution in original post

3 REPLIES 3

Datastream appends metadata, such as root.datastream_metadata.is_deleted, to records it sends to BigQuery. This metadata is essential for tracking changes and replication status. The issue arises because BigQuery cannot interpret this specific metadata field as a valid STRUCT, indicating a schema mismatch. The error specifically points to the is_deleted field within datastream_metadata, which BigQuery does not expect or is not correctly defined in the table schema. As a result of this schema incompatibility, Datastream is unable to write these events into the BigQuery destination table, leading to the discarding of affected events. This can result in data synchronization issues or incomplete data replication.

Possible Causes & Resolutions

  • Recent Datastream Changes: If there have been recent updates to your Datastream configuration or the structure of the source database, these could introduce fields that are incompatible with the existing BigQuery table schema. Solution: Review and adjust your BigQuery table schema or Datastream configuration to ensure compatibility with new metadata fields.

  • Evolving Schema with Backfill: If a backfill operation was performed after modifying the stream's source or schema, it might introduce historical records with the problematic metadata field, even if your current configuration is correct. Solution: Consider re-creating the BigQuery table with a schema that matches your current data stream's structure, or adjust the stream to temporarily exclude incompatible fields until the BigQuery schema can be updated.

  • Complex Data Types: While the specific error message does not directly relate to complex data types but rather to the metadata structure, ensuring that data types in MySQL are directly compatible with BigQuery's supported types is crucial. Solution: Simplify data types in MySQL where possible, or apply transformations within Datastream to ensure compatibility with BigQuery.

Troubleshooting Steps

  • Examine Source Data: Investigate your MySQL database to understand why the root.datastream_metadata.is_deleted field is included, particularly in the context of how deletions are managed in your source data.

  • Schema Verification: Carefully compare the schema of your source MySQL table with that of your destination BigQuery table. Ensure that all fields, especially metadata fields added by Datastream, are accounted for and compatible in the BigQuery schema.

  • Datastream Logs: Review Datastream logs for additional error messages or insights that could shed light on the root cause of the issue.

 

Hello 👋 I have the same issue in Datastream.

Initially I had a table without primary key when backfilling into BigQuery. At this moment, I had the datastream_metadata.is_deleted on the table. So I understood that Datastream writing mode was set to Append-mode.

In order to switch to Merge-mode. I tried to add 2 primary keys in my source SQL table. I deleted the BigQuery destination table and then I applied a backfill in Datastream.

It created a new BigQuery destination table without the is_deleted metadata. But then I had this error because Datastream doesn't understand that we are in Merge-mode and not in Append-mode. 
Even if I pause the stream, delete the BigQuery destination table, rerun the stream and backfill the data. It still sends the is_deleted when trying to add in BigQuery destination table.

Should I re-create my stream in order to reset the configuration of write mode to Merge-mode made by Datastream?

When Datastream recreate the destination table seeing that it has primary keys, it should set the write mode to Merge-mode even if it was previously in Append-mode. Is there a workaround without re-creating the stream?

Datastream's tendency to retain its initial append-mode configuration, even after changes to the source and destination tables, can be a significant obstacle when transitioning to merge-mode. This persistence may result in errors, such as the unexpected inclusion of the is_deleted metadata field when writing data to BigQuery, despite adding primary keys to the source table and deleting the existing BigQuery destination table.

The most straightforward and reliable solution to address this issue is to recreate the Datastream. Although this approach requires reconfiguring settings, it ensures that Datastream correctly recognizes the new merge-mode configuration based on the primary keys in the source table.

Alternatively, modifying Datastream's metadata directly to force recognition of merge-mode is a more advanced workaround that should be approached with caution. Errors in this process could lead to further data synchronization problems.

If neither of these solutions is feasible or if there is uncertainty about modifying metadata directly, contacting Google Cloud Support is advisable. They can offer expert guidance or even reset the stream configuration for you.