Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Events intraday migration to events table in BigQuery

When data of events_intraday  table get migrated to the evets_dataset on BigQuery. As we have two tables  events_intraday ( todays and yesterdays ) .
I have a cron job which extracting table data from bigQuery to GoogleCloudStorage Bucket using BQ Extract command.
Some time its throwing error that events_(date) table not found.
So I need to  know exactly when events_intraday table push data to events table.
Is there any configuration where I can view/edit timestamp of migration.

 

Solved Solved
0 3 2,230
1 ACCEPTED SOLUTION

Unfortunately, there is no configuration option to view or edit the timestamp of migration for the events_intraday table to the events table in BigQuery. This is a process managed by Google and is not exposed to users for modification.

The reason for this is that the migration of the events_intraday table to the events table is a complex process that requires careful coordination between Google Cloud and Firebase. If users were able to view or edit the timestamp of migration, this could potentially disrupt the migration process and lead to data loss.

Yes, there are a few workarounds you can consider:

  1. Error Handling in Your Script: You could add error handling in your script to catch the "table not found" error. When this error is caught, you could have your script wait for a certain period of time and then retry the operation. This would give the data migration process more time to complete.

  2. Use Both events_intraday_ and events_ Tables: If you need more real-time access to your data, you could modify your script to extract data from both the events_intraday_ and events_ tables. This would allow you to access the most recent data in the events_intraday_ table, as well as the historical data in the events_ table.

  3. Use Google Cloud Functions: You could use Google Cloud Functions to trigger your data extraction process based on changes in your BigQuery dataset. For example, you could set up a Cloud Function to be triggered whenever a new table is added to your dataset. This would ensure that your data extraction process only runs after the data migration has occurred.

 

 

View solution in original post

3 REPLIES 3

Unfortunately, there is no configuration option to view or edit the timestamp of migration for the events_intraday table to the events table in BigQuery. This is a process managed by Google and is not exposed to users for modification.

The reason for this is that the migration of the events_intraday table to the events table is a complex process that requires careful coordination between Google Cloud and Firebase. If users were able to view or edit the timestamp of migration, this could potentially disrupt the migration process and lead to data loss.

Yes, there are a few workarounds you can consider:

  1. Error Handling in Your Script: You could add error handling in your script to catch the "table not found" error. When this error is caught, you could have your script wait for a certain period of time and then retry the operation. This would give the data migration process more time to complete.

  2. Use Both events_intraday_ and events_ Tables: If you need more real-time access to your data, you could modify your script to extract data from both the events_intraday_ and events_ tables. This would allow you to access the most recent data in the events_intraday_ table, as well as the historical data in the events_ table.

  3. Use Google Cloud Functions: You could use Google Cloud Functions to trigger your data extraction process based on changes in your BigQuery dataset. For example, you could set up a Cloud Function to be triggered whenever a new table is added to your dataset. This would ensure that your data extraction process only runs after the data migration has occurred.

 

 

Under our understanding, the intraday events is a copy of the events data at a point in time, and the events data should capture the full activity records. However, for some days we see that the intraday events records seem higher than the events data . Do we have a possible explanation of why this is the case? Can records get deleted between intraday and the events data?

Yes, the scenario you described can happen, and there are several potential explanations for why the intraday events records might be higher than the events data for some days:

  1. Data Deduplication:

    • It's possible that the intraday table captures raw events, including duplicates or erroneous entries. When migrating this data to the events table, a deduplication process might be applied, removing any duplicate records. This would result in a reduced count in the events table compared to the intraday table.
  2. Data Filtering:

    • There might be a filtering process applied when migrating data from intraday to events. This could involve removing records that don't meet certain criteria, such as records with missing fields or records that are deemed invalid or irrelevant.
  3. Data Retention Policies:

    • BigQuery tables can have data retention policies applied to them. If the events table has a shorter retention policy than the intraday table, older records might be deleted from the events table, leading to a discrepancy in record counts.
  4. Streaming Data:

    • If you're using BigQuery's streaming data feature, there can be a delay between when data is streamed into the intraday table and when it's available for querying. This could lead to temporary discrepancies in record counts between the two tables.
  5. Manual Interventions:

    • Someone with the necessary permissions might have manually deleted or modified records in the events table after the data migration from intraday. This could be due to data correction, compliance requirements, or other reasons.
  6. Errors in Migration Process:

    • If there's an automated process that handles the migration of data from intraday to events, there might be errors or bugs in this process that result in data loss or discrepancies.
  7. Partitioned Tables:

    • If the events table is partitioned (e.g., by date), and you're querying a specific partition, ensure that you're comparing the correct partitions between the intraday and events tables. A mismatch in queried partitions can lead to discrepancies in record counts.

To determine the exact cause, you'd need to review the processes and configurations related to data migration, retention, and querying in your BigQuery setup. If records are indeed getting deleted between intraday and events, it's crucial to identify the reason to ensure data integrity and accuracy.