Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

BigQuery table restoration usecases

Hi Folks,

How can we perform the following restoration in BigQuery?

  • Need to be able to restore any tables back in PIT (Point-In-Time).
  • We load the data daily and if we find any bad data that loaded last night need to be backout meaning we need to be able to bring the tables back to the time before we load the data.
  • Roll ALL new data and updated data back to the previous run.

Any leads on this with simple queries on a dummy table are much appreciated.

Thanks,

Vigneswar

Solved Solved
0 6 4,704
1 ACCEPTED SOLUTION

Hi @vigneswar17 ,

As @mdideles pointed out. BigQuery has a feature called "Time Travel" that you can use to query and restore tables to a specific Point-In-Time (PIT) in the past. Here's how it works:

Understanding Time Travel in BigQuery:

  • Time Travel is enabled by default, allowing you to access historical data for up to 7 days.
  • To change the retention period, go to the BigQuery console and adjust the "Time Travel window" setting for your dataset.

Querying a Table at a Specific PIT:

Use the FOR SYSTEM_TIME AS OF clause in your SQL query to view the table as it was in the past. For example, to see the table's state from 24 hours ago:

 
SELECT * FROM `project.dataset.table` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY);

Restoring the Table to a PIT:

  1. Create a new table using the historical version:

     
    CREATE OR REPLACE TABLE `project.dataset.table_restored` 
    AS
    SELECT * FROM `project.dataset.table` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY);
    

Example with a Dummy Table:

  1. Create a table:

     
    CREATE TABLE `project.dataset.my_table` (id INT64, name STRING, last_updated TIMESTAMP);
    
  2. Insert good data:

     
    INSERT INTO `project.dataset.my_table` VALUES (1, 'Alice', CURRENT_TIMESTAMP()), (2, 'Bob', CURRENT_TIMESTAMP());
    
  3. Simulate a bad data load:

     
    UPDATE `project.dataset.my_table` SET name = 'Bad Data' WHERE id = 1;
    
  4. Query at previous PIT (before bad data was loaded):

     
    SELECT * FROM `project.dataset.my_table` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 MINUTE);
    
  5. Restore to previous PIT:

     
    CREATE OR REPLACE TABLE `project.dataset.my_table_restored` 
    AS 
    SELECT * FROM `project.dataset.my_table` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 MINUTE);
    

Important Considerations:

  • Data is only available for the set retention period (default 7 days). Plan accordingly.
  • CREATE OR REPLACE overwrites any existing table with the specified name. Choose an appropriate name for your restored table.
  • Adjust PIT in your queries based on when the error was introduced.
  • For granular control and longer-term backups,consider using regular table snapshots.

View solution in original post

6 REPLIES 6

Hi @vigneswar17 ,

As @mdideles pointed out. BigQuery has a feature called "Time Travel" that you can use to query and restore tables to a specific Point-In-Time (PIT) in the past. Here's how it works:

Understanding Time Travel in BigQuery:

  • Time Travel is enabled by default, allowing you to access historical data for up to 7 days.
  • To change the retention period, go to the BigQuery console and adjust the "Time Travel window" setting for your dataset.

Querying a Table at a Specific PIT:

Use the FOR SYSTEM_TIME AS OF clause in your SQL query to view the table as it was in the past. For example, to see the table's state from 24 hours ago:

 
SELECT * FROM `project.dataset.table` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY);

Restoring the Table to a PIT:

  1. Create a new table using the historical version:

     
    CREATE OR REPLACE TABLE `project.dataset.table_restored` 
    AS
    SELECT * FROM `project.dataset.table` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY);
    

Example with a Dummy Table:

  1. Create a table:

     
    CREATE TABLE `project.dataset.my_table` (id INT64, name STRING, last_updated TIMESTAMP);
    
  2. Insert good data:

     
    INSERT INTO `project.dataset.my_table` VALUES (1, 'Alice', CURRENT_TIMESTAMP()), (2, 'Bob', CURRENT_TIMESTAMP());
    
  3. Simulate a bad data load:

     
    UPDATE `project.dataset.my_table` SET name = 'Bad Data' WHERE id = 1;
    
  4. Query at previous PIT (before bad data was loaded):

     
    SELECT * FROM `project.dataset.my_table` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 MINUTE);
    
  5. Restore to previous PIT:

     
    CREATE OR REPLACE TABLE `project.dataset.my_table_restored` 
    AS 
    SELECT * FROM `project.dataset.my_table` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 MINUTE);
    

Important Considerations:

  • Data is only available for the set retention period (default 7 days). Plan accordingly.
  • CREATE OR REPLACE overwrites any existing table with the specified name. Choose an appropriate name for your restored table.
  • Adjust PIT in your queries based on when the error was introduced.
  • For granular control and longer-term backups,consider using regular table snapshots.

Thanks for the information, it really helps.

But when I execute below statement getting an error on timestamp.

SELECT * FROM `dwh-01-svc-prj-11.dnaus3.test` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 MINUTE);
 
Invalid time travel timestamp 1705579511054 for table dwh-01-svc-prj-11:dnaus3.test@1705579511054. Cannot read before 1705580605061
 
Any thoughts on this?

Here is how to troubleshoot the "Invalid time travel timestamp" error you're encountering in BigQuery:

Understanding the Error:

The error message you're seeing, "Invalid time travel timestamp...” means that the specific timestamp you're using with the FOR SYSTEM_TIME AS OF clause is too far in the past. This is because the Time Travel window (the amount of historical data retained) for the table is shorter than the duration you're trying to go back.

Troubleshooting Steps:

  1. Check the Time Travel Window:

    • Go to the BigQuery Console, find the dataset (dnaus3) that contains your table.
    • Click on "Edit Dataset" and look for the "Time Travel Window" setting. This will tell you how far back BigQuery's Time Travel will allow you to go.
    • If the Time Travel Window isn't long enough, you can increase it (up to 7 days).
  2. Adjust the Interval:

    • Modify the INTERVAL in the TIMESTAMP_SUB function to specify a more recent time that is within the Time Travel window for your table. For example,if Time Travel only goes back 2 hours, try:
     
    SELECT * FROM `dwh-01-svc-prj-11.dnaus3.test` 
    FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR);
    
  3. Confirm Table Existence at PIT:

    • If you are still getting the error, check if the table or data actually existed at the historical time you are trying to query. A table might have been created after the PIT you are specifying.

Example Scenario & Solution:

Let's imagine that:

  • Your current time is 10:00 AM
  • The table’s Time Travel window is set to 4 hours.

In this case, this query would fail because 9:30 AM is outside of the 4 hour window:

 
-- Will Fail:
SELECT * FROM `project.dataset.table` 
FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 MINUTE); 

Instead, you would adjust the INTERVAL to go back a shorter amount of time:

 
-- Should Work (Going back only 30 minutes):
SELECT * FROM `project.dataset.table` 
FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 MINUTE); 

Additional Notes:

  • The exact error message (particularly the specific timestamps included) might vary slightly, but the principle of checking the Time Travel window and adjusting your PIT remains the same.
  • If you still have problems after trying these steps, provide the following details so I can assist further:
    • The exact error message
    • The Time Travel setting for your dataset
    • An explanation of when you need to query the table (what time in the past).

Thank you so much!!
I don't see any difference in the queries,

-- Will Fail:
SELECT * FROM `project.dataset.table` 
FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 MINUTE); 
-- Should Work (Going back only 30 minutes):
SELECT * FROM `project.dataset.table` 
FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 MINUTE); 

I have 2 cases where to restore data in within 7 days and also beyond those 7 days (months before).

Will this approach able to retrieve the data even for retrieving months old data as well if we adjust the query interval? Or do we have a separate approach for this scenario?
 
Also, how pricing is calculated for these restoration queries?

Apologies for the confusion earlier. Let's clarify the approach for data recovery in BigQuery, especially for periods beyond the 7-day Time Travel window:

1. Retrieving Data Beyond the 7-Day Time Travel Window:

BigQuery's built-in Time Travel feature is convenient but is limited to a 7-day window. To recover data from older points in time, you'll need to consider the following strategies:

  • Option 1: Regular Backups/Snapshots:
    • Proactively create regular backups or snapshots of your essential BigQuery tables. Store these backups in Google Cloud Storage for long-term preservation.
    • To restore data from a previous time, you can load from these backups into BigQuery.
    • Refer to the BigQuery documentation for guidance on creating snapshots: https://cloud.google.com/bigquery/docs/table-snapshots-intro
  • Option 2: Data History in Application Logic:
    • If you need to recover very specific data (e.g., revert a few records to a previous state), you could implement data history or an audit log directly within your application's logic.
    • This approach can offer more granular control over recovery but requires additional development effort.

2. Pricing of Restoration Queries:

  • BigQuery charges for queries based on the amount of data processed. This applies to both standard queries and Time Travel queries.
  • The cost of restoring data will depend on the size of the data being queried at that specific point in time.
  • Remember that storing snapshots/backups will incur storage costs in Google Cloud Storage.
  • For up-to-date pricing information, refer to the BigQuery pricing page: https://cloud.google.com/bigquery/pricing

Example for Restoring Data within the 7-Day Time Travel Window:

To restore a table to its state from yesterday, you would use a Time Travel query like this:

 
-- Query to go back one day:
SELECT * FROM `project.dataset.table` 
FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY);
  • Backup Strategy: Carefully decide on a backup strategy that aligns with your business's data retention and recovery needs. Regular backups are crucial.
  • Testing Recovery Process: Make sure you regularly test your backup and recovery processes to ensure you can recover data successfully when needed.