Hello,
We have a table function where we use it to populate a view, and in the function I would like to refer to a materialized view or a table to make sure the profiles (ID) are not there because they are deleted.
Table function is roughly like this:
Select a bunch of arrays and structs (unnest them too) and rename them to populate the view, which is essentially a transformation of raw data,
From clause where we refer to the raw table,
Where clause where it filters out null values, but also should filter out profiles that are deleted.
What I came up with as below example: a window function to get the last operation type, and if the last operation type is DELETE, then it means profile is deleted (it is possible for profile to be inserted back again, thus we need to check it either very often, or every time there is a change to table, and also because profiles are updated quite frequently). Which brings to my mind what to create in order to be able to refer to these deleted profile values inside the where clause of the table function. Is it possible to use a materialized view for the below query and refer to it in the above table function at the where clause? I know window function is not supported now but is there another way? Should we just populate a table with a scheduled query using below, but I am afraid we will have problems with latency of updates of the schedules. Or should I just use the below query inside the where function as a nested query?
Solved! Go to Solution.
Based on your requirements and concerns, I see a few potential solutions.
RAW_TABLE
and the complexity of the rest of your table function, this could be computationally expensive and slow. Here's how this might look:SELECT
-- Your arrays, structs, etc.
FROM
`RAW_TABLE`
WHERE
-- Your existing conditions AND
profile_id NOT IN (
SELECT profile_id
FROM (
SELECT profile_id, profile_operation_type, ROW_NUMBER() OVER (PARTITION BY profile_id ORDER BY profile_operation_type, profile_updated_at ASC) as rn
FROM `RAW_TABLE`
)
WHERE rn = 1 AND profile_operation_type = 'DELETE'
)
CREATE OR REPLACE TEMPORARY TABLE temp_deleted_profiles AS
SELECT profile_id, profile_operation_type
FROM (
SELECT profile_id, profile_operation_type, ROW_NUMBER() OVER (PARTITION BY profile_id ORDER BY profile_operation_type, profile_updated_at ASC) as rn
FROM `RAW_TABLE`
)
WHERE rn = 1 and profile_operation_type = 'DELETE';
Then refer to this temporary table in your main table function:
SELECT
-- Your arrays, structs, etc.
FROM
`RAW_TABLE`
WHERE
-- Your existing conditions AND
profile_id NOT IN (SELECT profile_id FROM temp_deleted_profiles)
Each of these options has its pros and cons, and the best choice depends on your specific use case, requirements, and constraints. Consider factors such as the size and update frequency of your RAW_TABLE
, your performance needs, and how up-to-date you need your data to be.
Based on your requirements and concerns, I see a few potential solutions.
RAW_TABLE
and the complexity of the rest of your table function, this could be computationally expensive and slow. Here's how this might look:SELECT
-- Your arrays, structs, etc.
FROM
`RAW_TABLE`
WHERE
-- Your existing conditions AND
profile_id NOT IN (
SELECT profile_id
FROM (
SELECT profile_id, profile_operation_type, ROW_NUMBER() OVER (PARTITION BY profile_id ORDER BY profile_operation_type, profile_updated_at ASC) as rn
FROM `RAW_TABLE`
)
WHERE rn = 1 AND profile_operation_type = 'DELETE'
)
CREATE OR REPLACE TEMPORARY TABLE temp_deleted_profiles AS
SELECT profile_id, profile_operation_type
FROM (
SELECT profile_id, profile_operation_type, ROW_NUMBER() OVER (PARTITION BY profile_id ORDER BY profile_operation_type, profile_updated_at ASC) as rn
FROM `RAW_TABLE`
)
WHERE rn = 1 and profile_operation_type = 'DELETE';
Then refer to this temporary table in your main table function:
SELECT
-- Your arrays, structs, etc.
FROM
`RAW_TABLE`
WHERE
-- Your existing conditions AND
profile_id NOT IN (SELECT profile_id FROM temp_deleted_profiles)
Each of these options has its pros and cons, and the best choice depends on your specific use case, requirements, and constraints. Consider factors such as the size and update frequency of your RAW_TABLE
, your performance needs, and how up-to-date you need your data to be.
Hi,
Thanks a lot for guidance. After some more time going into documentation, I believe we can set non-incremental mat views with some data staleness that we can tolerate. I came up with this: