When data of events_intraday table get migrated to the evets_dataset on BigQuery. As we have two tables events_intraday ( todays and yesterdays ) .
I have a cron job which extracting table data from bigQuery to GoogleCloudStorage Bucket using BQ Extract command.
Some time its throwing error that events_(date) table not found.
So I need to know exactly when events_intraday table push data to events table.
Is there any configuration where I can view/edit timestamp of migration.
Solved! Go to Solution.
Unfortunately, there is no configuration option to view or edit the timestamp of migration for the events_intraday table to the events table in BigQuery. This is a process managed by Google and is not exposed to users for modification.
The reason for this is that the migration of the events_intraday table to the events table is a complex process that requires careful coordination between Google Cloud and Firebase. If users were able to view or edit the timestamp of migration, this could potentially disrupt the migration process and lead to data loss.
Yes, there are a few workarounds you can consider:
Error Handling in Your Script: You could add error handling in your script to catch the "table not found" error. When this error is caught, you could have your script wait for a certain period of time and then retry the operation. This would give the data migration process more time to complete.
Use Both events_intraday_
and events_
Tables: If you need more real-time access to your data, you could modify your script to extract data from both the events_intraday_
and events_
tables. This would allow you to access the most recent data in the events_intraday_
table, as well as the historical data in the events_
table.
Use Google Cloud Functions: You could use Google Cloud Functions to trigger your data extraction process based on changes in your BigQuery dataset. For example, you could set up a Cloud Function to be triggered whenever a new table is added to your dataset. This would ensure that your data extraction process only runs after the data migration has occurred.
Unfortunately, there is no configuration option to view or edit the timestamp of migration for the events_intraday table to the events table in BigQuery. This is a process managed by Google and is not exposed to users for modification.
The reason for this is that the migration of the events_intraday table to the events table is a complex process that requires careful coordination between Google Cloud and Firebase. If users were able to view or edit the timestamp of migration, this could potentially disrupt the migration process and lead to data loss.
Yes, there are a few workarounds you can consider:
Error Handling in Your Script: You could add error handling in your script to catch the "table not found" error. When this error is caught, you could have your script wait for a certain period of time and then retry the operation. This would give the data migration process more time to complete.
Use Both events_intraday_
and events_
Tables: If you need more real-time access to your data, you could modify your script to extract data from both the events_intraday_
and events_
tables. This would allow you to access the most recent data in the events_intraday_
table, as well as the historical data in the events_
table.
Use Google Cloud Functions: You could use Google Cloud Functions to trigger your data extraction process based on changes in your BigQuery dataset. For example, you could set up a Cloud Function to be triggered whenever a new table is added to your dataset. This would ensure that your data extraction process only runs after the data migration has occurred.
Under our understanding, the intraday events is a copy of the events data at a point in time, and the events data should capture the full activity records. However, for some days we see that the intraday events records seem higher than the events data . Do we have a possible explanation of why this is the case? Can records get deleted between intraday and the events data?
Yes, the scenario you described can happen, and there are several potential explanations for why the intraday
events records might be higher than the events
data for some days:
Data Deduplication:
intraday
table captures raw events, including duplicates or erroneous entries. When migrating this data to the events
table, a deduplication process might be applied, removing any duplicate records. This would result in a reduced count in the events
table compared to the intraday
table.Data Filtering:
intraday
to events
. This could involve removing records that don't meet certain criteria, such as records with missing fields or records that are deemed invalid or irrelevant.Data Retention Policies:
events
table has a shorter retention policy than the intraday
table, older records might be deleted from the events
table, leading to a discrepancy in record counts.Streaming Data:
intraday
table and when it's available for querying. This could lead to temporary discrepancies in record counts between the two tables.Manual Interventions:
events
table after the data migration from intraday
. This could be due to data correction, compliance requirements, or other reasons.Errors in Migration Process:
intraday
to events
, there might be errors or bugs in this process that result in data loss or discrepancies.Partitioned Tables:
events
table is partitioned (e.g., by date), and you're querying a specific partition, ensure that you're comparing the correct partitions between the intraday
and events
tables. A mismatch in queried partitions can lead to discrepancies in record counts.To determine the exact cause, you'd need to review the processes and configurations related to data migration, retention, and querying in your BigQuery setup. If records are indeed getting deleted between intraday
and events
, it's crucial to identify the reason to ensure data integrity and accuracy.