Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Datastream to bigquery data flow template problem

I have a streaming job in datastream which streams the data from mysql to cloud storage in avro format and I have enabled pub/sub notifications and I am trying to create a dataflow job to continuously stream data to big query and  I used existing "Datastream to big query" dataflow  template, the problem I am facing is the data is not loaded into the final table or replica table but i can see the data in the staging table anyone know whats the issue there?image.pngimage.png 

Solved Solved
0 5 2,235
1 ACCEPTED SOLUTION

If you are having trouble loading data from Datastream to BigQuery using a Dataflow job, there are a few things you can check:

  • Datastream stream logs: Check the Datastream stream logs for any errors, warnings, or unusual patterns. For example, you may see errors indicating that the Datastream stream is unable to connect to the MySQL database,or that it is unable to write data to Cloud Storage.
  • BigQuery job logs: Check the BigQuery job logs for any errors,especially if the Dataflow job is using batch loads into BigQuery.For example, you may see errors indicating that the BigQuery table does not exist, or that the Dataflow job does not have permission to write to the BigQuery table.
  • Debug mode: Running the Dataflow job in debug mode can be helpful for identifying unexpected behavior in the pipeline. However, it is important to note that doing so in a production environment can have performance implications.
  • Dataflow region and BigQuery dataset region: For best performance and cost, ensure that the Dataflow job and BigQuery dataset are in the same region. However, Dataflow can write to BigQuery datasets in other regions, but it is not recommended.
  • Partitioned tables: Dataflow does support writing to partitioned tables in BigQuery. Ensure that the partitioning column and type are correctly configured in both the Dataflow job and the BigQuery table.

If you have checked all of the above and you are still having trouble, please provide me with the following information:

  • The name of your Dataflow job
  • The name of your BigQuery dataset and table
  • The Dataflow job logs
  • The BigQuery job logs

Redacting sensitive information: When sharing logs, please redact or remove any sensitive information, such as passwords, database connection strings, and personally identifiable information (PII). It is also best to share logs directly with support or trusted individuals, rather than in public forums.

View solution in original post

5 REPLIES 5