Hi Folks,
How can we perform the following restoration in BigQuery?
Any leads on this with simple queries on a dummy table are much appreciated.
Thanks,
Vigneswar
Solved! Go to Solution.
Hi @vigneswar17 ,
As @mdideles pointed out. BigQuery has a feature called "Time Travel" that you can use to query and restore tables to a specific Point-In-Time (PIT) in the past. Here's how it works:
Understanding Time Travel in BigQuery:
Querying a Table at a Specific PIT:
Use the FOR SYSTEM_TIME AS OF
clause in your SQL query to view the table as it was in the past. For example, to see the table's state from 24 hours ago:
SELECT * FROM `project.dataset.table` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY);
Restoring the Table to a PIT:
Create a new table using the historical version:
CREATE OR REPLACE TABLE `project.dataset.table_restored`
AS
SELECT * FROM `project.dataset.table` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY);
Example with a Dummy Table:
Create a table:
CREATE TABLE `project.dataset.my_table` (id INT64, name STRING, last_updated TIMESTAMP);
Insert good data:
INSERT INTO `project.dataset.my_table` VALUES (1, 'Alice', CURRENT_TIMESTAMP()), (2, 'Bob', CURRENT_TIMESTAMP());
Simulate a bad data load:
UPDATE `project.dataset.my_table` SET name = 'Bad Data' WHERE id = 1;
Query at previous PIT (before bad data was loaded):
SELECT * FROM `project.dataset.my_table` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 MINUTE);
Restore to previous PIT:
CREATE OR REPLACE TABLE `project.dataset.my_table_restored`
AS
SELECT * FROM `project.dataset.my_table` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 MINUTE);
Important Considerations:
CREATE OR REPLACE
overwrites any existing table with the specified name. Choose an appropriate name for your restored table.Have you tried the Time Travel? - Access historical data | BigQuery | Google Cloud
Hi @vigneswar17 ,
As @mdideles pointed out. BigQuery has a feature called "Time Travel" that you can use to query and restore tables to a specific Point-In-Time (PIT) in the past. Here's how it works:
Understanding Time Travel in BigQuery:
Querying a Table at a Specific PIT:
Use the FOR SYSTEM_TIME AS OF
clause in your SQL query to view the table as it was in the past. For example, to see the table's state from 24 hours ago:
SELECT * FROM `project.dataset.table` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY);
Restoring the Table to a PIT:
Create a new table using the historical version:
CREATE OR REPLACE TABLE `project.dataset.table_restored`
AS
SELECT * FROM `project.dataset.table` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY);
Example with a Dummy Table:
Create a table:
CREATE TABLE `project.dataset.my_table` (id INT64, name STRING, last_updated TIMESTAMP);
Insert good data:
INSERT INTO `project.dataset.my_table` VALUES (1, 'Alice', CURRENT_TIMESTAMP()), (2, 'Bob', CURRENT_TIMESTAMP());
Simulate a bad data load:
UPDATE `project.dataset.my_table` SET name = 'Bad Data' WHERE id = 1;
Query at previous PIT (before bad data was loaded):
SELECT * FROM `project.dataset.my_table` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 MINUTE);
Restore to previous PIT:
CREATE OR REPLACE TABLE `project.dataset.my_table_restored`
AS
SELECT * FROM `project.dataset.my_table` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 MINUTE);
Important Considerations:
CREATE OR REPLACE
overwrites any existing table with the specified name. Choose an appropriate name for your restored table.Thanks for the information, it really helps.
But when I execute below statement getting an error on timestamp.
Here is how to troubleshoot the "Invalid time travel timestamp" error you're encountering in BigQuery:
Understanding the Error:
The error message you're seeing, "Invalid time travel timestamp...” means that the specific timestamp you're using with the FOR SYSTEM_TIME AS OF
clause is too far in the past. This is because the Time Travel window (the amount of historical data retained) for the table is shorter than the duration you're trying to go back.
Troubleshooting Steps:
Check the Time Travel Window:
dnaus3
) that contains your table.Adjust the Interval:
INTERVAL
in the TIMESTAMP_SUB
function to specify a more recent time that is within the Time Travel window for your table. For example,if Time Travel only goes back 2 hours, try:SELECT * FROM `dwh-01-svc-prj-11.dnaus3.test`
FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR);
Confirm Table Existence at PIT:
Example Scenario & Solution:
Let's imagine that:
In this case, this query would fail because 9:30 AM is outside of the 4 hour window:
-- Will Fail:
SELECT * FROM `project.dataset.table`
FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 MINUTE);
Instead, you would adjust the INTERVAL
to go back a shorter amount of time:
-- Should Work (Going back only 30 minutes):
SELECT * FROM `project.dataset.table`
FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 MINUTE);
Additional Notes:
Thank you so much!!
I don't see any difference in the queries,
-- Will Fail: SELECT * FROM `project.dataset.table` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 MINUTE);
-- Should Work (Going back only 30 minutes): SELECT * FROM `project.dataset.table` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 MINUTE);
I have 2 cases where to restore data in within 7 days and also beyond those 7 days (months before).
Apologies for the confusion earlier. Let's clarify the approach for data recovery in BigQuery, especially for periods beyond the 7-day Time Travel window:
1. Retrieving Data Beyond the 7-Day Time Travel Window:
BigQuery's built-in Time Travel feature is convenient but is limited to a 7-day window. To recover data from older points in time, you'll need to consider the following strategies:
2. Pricing of Restoration Queries:
Example for Restoring Data within the 7-Day Time Travel Window:
To restore a table to its state from yesterday, you would use a Time Travel query like this:
-- Query to go back one day:
SELECT * FROM `project.dataset.table`
FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY);