Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

BigQuery table restoration usecases

Hi Folks,

How can we perform the following restoration in BigQuery?

  • Need to be able to restore any tables back in PIT (Point-In-Time).
  • We load the data daily and if we find any bad data that loaded last night need to be backout meaning we need to be able to bring the tables back to the time before we load the data.
  • Roll ALL new data and updated data back to the previous run.

Any leads on this with simple queries on a dummy table are much appreciated.

Thanks,

Vigneswar

Solved Solved
0 6 4,735
1 ACCEPTED SOLUTION

Hi @vigneswar17 ,

As @mdideles pointed out. BigQuery has a feature called "Time Travel" that you can use to query and restore tables to a specific Point-In-Time (PIT) in the past. Here's how it works:

Understanding Time Travel in BigQuery:

  • Time Travel is enabled by default, allowing you to access historical data for up to 7 days.
  • To change the retention period, go to the BigQuery console and adjust the "Time Travel window" setting for your dataset.

Querying a Table at a Specific PIT:

Use the FOR SYSTEM_TIME AS OF clause in your SQL query to view the table as it was in the past. For example, to see the table's state from 24 hours ago:

 
SELECT * FROM `project.dataset.table` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY);

Restoring the Table to a PIT:

  1. Create a new table using the historical version:

     
    CREATE OR REPLACE TABLE `project.dataset.table_restored` 
    AS
    SELECT * FROM `project.dataset.table` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY);
    

Example with a Dummy Table:

  1. Create a table:

     
    CREATE TABLE `project.dataset.my_table` (id INT64, name STRING, last_updated TIMESTAMP);
    
  2. Insert good data:

     
    INSERT INTO `project.dataset.my_table` VALUES (1, 'Alice', CURRENT_TIMESTAMP()), (2, 'Bob', CURRENT_TIMESTAMP());
    
  3. Simulate a bad data load:

     
    UPDATE `project.dataset.my_table` SET name = 'Bad Data' WHERE id = 1;
    
  4. Query at previous PIT (before bad data was loaded):

     
    SELECT * FROM `project.dataset.my_table` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 MINUTE);
    
  5. Restore to previous PIT:

     
    CREATE OR REPLACE TABLE `project.dataset.my_table_restored` 
    AS 
    SELECT * FROM `project.dataset.my_table` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 MINUTE);
    

Important Considerations:

  • Data is only available for the set retention period (default 7 days). Plan accordingly.
  • CREATE OR REPLACE overwrites any existing table with the specified name. Choose an appropriate name for your restored table.
  • Adjust PIT in your queries based on when the error was introduced.
  • For granular control and longer-term backups,consider using regular table snapshots.

View solution in original post

6 REPLIES 6