Hi
I have created a view in bigquery that calls data from various bigquery tables which are populated from google sheets on a scheduled refresh at various intervals. How do I create a new column in my view that pulls the last scheduled refresh from one of the tables?
I hope that makes sense.
Thanks!
You can create a new column in your BigQuery view that pulls the last scheduled refresh time from one of the tables by using the _PARTITIONTIME
pseudo column or the LAST_MODIFIED_TIME
metadata. Here's how you can do it:
Using _PARTITIONTIME (for partitioned tables)
If the table you're pulling data from is partitioned, you can use the _PARTITIONTIME
pseudo column to get the timestamp of the last refresh. Here's an example SQL query:
CREATE OR REPLACE VIEW your_project.your_dataset.your_view AS
SELECT
*,
MAX(_PARTITIONTIME) AS last_refresh_time
FROM
your_project.your_dataset.your_table
GROUP BY
your_columns;
Using LAST_MODIFIED_TIME (for non-partitioned tables)
If the table is not partitioned, you can use the LAST_MODIFIED_TIME
metadata to get the last refresh time. Here's an example
CREATE OR REPLACE VIEW your_project.your_dataset.your_view AS
SELECT
*,
TIMESTAMP_MICROS(
(SELECT LAST_MODIFIED_TIME
FROM your_project.your_dataset.__TABLES__
WHERE table_id='your_table')
) AS last_refresh_time
FROM
your_project.your_dataset.your_table;
Make sure to replace your_project
, your_dataset
, your_view
, and your_table
with the appropriate names for your project, dataset, view, and table.
Note that the LAST_MODIFIED_TIME
approach will give you the last time the table itself was modified, which may or may not correspond to the last refresh time of the data from Google Sheets, depending on how your pipeline is set up. If you have control over the refresh process, you might consider adding a timestamp column to the table itself that gets updated with the current time whenever the data is refreshed, and then you can simply select that column in your view.