GCP Bigquery Time Travel for Physical Billing

soniacolesniuc · 06-06-2023 11:11 PM

I'm trying to see if switching my dataset to physical billing would lower the costs and it says the total storage costs will include the bytes used for time travel storage. How does time travel work in this instance?

If my dataset tables are updated daily, how would time travel affect the price compared to an identical but logical dataset?

I ran the query that would return some superficial savings, but it's not very clear how time travel will affect it

ms4446

Time travel refers to the ability to access data that has been changed or deleted within a specific time window. By default, this time window covers the past seven days. With time travel, you can query data that was updated or deleted, restore a table that was deleted, or restore a table that expired. You can also adjust the time travel window to balance storage costs with your data retention needs, with a minimum of two days and a maximum of seven days.

The total storage costs you're billed for will include the bytes used for time travel storage if you set your storage billing model to use physical bytes. Conversely, if you set your storage billing model to use logical bytes, the total storage costs you are billed for will not include the bytes used for time travel storage.

If your dataset tables are updated daily, time travel would affect the price because it would store the changes made every day for the duration of the time travel window, which can be up to seven days. Therefore, the more changes made to your data, the higher the cost because more storage space would be needed for time travel. This would be the case if you're using the physical bytes billing model. If you're using the logical bytes model, these changes would not affect the cost because time travel storage is not included in the billing.

The pricing may depend on various factors such as the amount of data stored for time travel, the duration of the time travel window, and the specific billing model (physical bytes vs logical bytes) you're using.

satishm

Hi @ms4446 , Thanks for the information. How does the time travel store the data ? does it store incremental changes or the entire table when ever there was change done to the table ? Just wondering if it keeps number of copies of the table for the time travel window for all the changes.

ms4446

In BigQuery, time travel stores incremental changes to the data, rather than storing the entire table for each change. This approach helps reduce the storage space required for time travel data.

For instance, let's consider a table with 100 rows. If a change is made to one of the rows, only the changes made to that specific row will be stored in the time travel data. The remaining 99 rows, which haven't been modified, will not be duplicated in the time travel storage.

Consequently, the number of copies of the table stored in the time travel data corresponds to the number of changes made to the table. For instance, if there have been 10 changes made to a table, then 10 copies of the table will be stored in the time travel data.

The time travel window refers to the duration for which historical data is retained. By default, the time travel window in BigQuery is set to 7 days. This means that you can access data that has been changed or deleted within the past 7 days using the time travel feature.

satishm

Thanks for your response. You have mentioned above that "For instance, if there have been 10 changes made to a table, then 10 copies of the table will be stored in the time travel data" - doesn't mean the entire table copies ? only the incremental changes of the table ? correct ? Just wanted to confirm my understanding.

ms4446

You are correct. The 10 copies of the table that are stored in time travel only contain the incremental changes that have been made to the table. The entire table is not stored.

For example, if you have a table with 100 rows and you make 10 changes to the table, then the 10 copies of the table that are stored in time travel will only contain the changes that have been made to the 100 rows. The other 90 rows will not be stored, because they have not changed.

This means that the amount of storage space that is required for time travel data is significantly less than the amount of storage space that would be required if the entire table was stored for each change.

satishm

Hi @ms4446 , Have another question ? How does the time travel bytes work for the external tables ? Thanks again for your response.

ms4446

BigQuery's time travel feature only works with data stored in BigQuery that has been changed or deleted. Time travel bytes for external tables are not available because BigQuery does not store the data for external tables in its own storage. Instead, BigQuery only stores the metadata for external tables, such as the table schema and the location of the data in the external data source.

However, if you update or delete data in an external data source that is linked to an external table in BigQuery, the changes will be reflected in the external table. This is because BigQuery will periodically poll the external data source for changes. If it detects any changes, it will update the metadata for the external table to reflect the changes.

So, while time travel bytes are not available for external tables, you can still use BigQuery's time travel feature to query historical data from your external data sources. Just keep in mind that you will only be able to query the data that was present in the external data source at the time of the query.