Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

BQ Time travel with Analytics Hub's Linked dataset

Hi team,

I've setup a data exchange and created a listing in it. I subscribed to it and this created a linked dataset in my target project (different from source project). When I try to run time travel queries for tables in my linked dataset  (select * from target_project.linked_dataset.standard_table for system_time as of timestamp() ) they fail with Not found: Table standard_table:linked_dataset.standard_table@1713797390587 was not found in location US at [1:1] error even though the timestamp used in the query is earlier than 2 days (minimum time travel retention period). I've verified that both my source dataset (aka shared dataset according to Analytics Hub ) and linked datasets are in the same regions (US).
My question is how can we use time travel for standard tables in linked dataset to retrieve the data at some time in near past?
Thanks

Solved Solved
0 7 1,182
1 ACCEPTED SOLUTION

The error you're getting suggests there's a disconnect between how time travel is normally handled in BigQuery and how it interacts with linked datasets from Analytics Hub. While time travel works seamlessly on native BigQuery tables, linked datasets present a unique scenario.

Why Standard Time Travel Doesn't Work as Expected

  • Linked Data Nature: Linked datasets in Analytics Hub act as references or pointers to the original data in the source project. They are not physical copies of the data.
  • Time Travel Implementation: Time travel relies on historical snapshots of table data. Since linked datasets are references, these snapshots aren't automatically created for them.

Refined Approach for Time Travel with Linked Datasets

  1. Identify Source Tables: Determine the exact tables in the source project's dataset that correspond to the linked dataset tables you're trying to query.

  2. Direct Time Travel on Source: Instead of querying the linked dataset directly with time travel, construct your queries to target the source tables in the original dataset.

    • Example: If your linked dataset is target_project.linked_dataset.standard_table, find its corresponding source table (e.g., source_project.source_dataset.standard_table) and use this in your time travel query:

       
      SELECT *
      FROM source_project.source_dataset.standard_table FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
      

Additional Considerations

  • Permissions: Ensure that you have the necessary permissions to access both the source project and the linked dataset.
  • Egress Restrictions (If Applicable): Double-check if any data egress restrictions have been inadvertently applied to the listing. If so, you'll need to adjust them to allow time travel queries on the source data.
  • Alternative (Data Copies): If time travel is a critical requirement and direct querying of the source is not feasible, you might consider creating copies of the relevant tables within your target project. These copies would then allow for standard time travel operations.

Example (Complete Flow)

  1. Listing in Data Exchange (Source Project): source_project.source_dataset (shared as source_dataset)

  2. Linked Dataset (Target Project): target_project.linked_dataset

  3. Query:

    SELECT *
    FROM source_project.source_dataset.standard_table FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
    

 

View solution in original post

7 REPLIES 7