Hi, I have 2 datastream from the same mysql source that ingest data to bigquery and google cloud storage (accessed in bigquery using external table).
I found a lot of times, there are missing rows in bigquery table. So for example, the row count in source & destination are like this:
table | row count |
mysql (source) | 100 |
bigquery | 86 |
cloud storage | 100 |
To temporarily fix this, I usually do a backfill for the datastream to bigquery. However as we have no idea what the root cause is, it's keep happinening. Wondering if anyone have the same experience, and what can I do to fix the issue for the data that are directly ingested to bigquery.
Hi @brown_squirrel,
Welcome to Google Cloud Community!
Discrepancies on Datastream between your source(MySQL) and destinations(Cloud Storage and BigQuery) indicate a possible problem on your ingestion pipeline specifically to BigQuery. Datastream missing data with BigQuery as a destination can be due to various reasons. Here are some potential causes and suggestions that may help resolve the issue:
If the issue persists, I recommend reaching out to Google Cloud Support for further assistance, as they can provide insights into whether this behavior is specific to your project.
I hope the above information is helpful.