Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Refer to a materialized view inside a Table function

Hello, 

We have a table function where we use it to populate a view, and in the function I would like to refer to a materialized view or a table to make sure the profiles (ID) are not there because they are deleted. 

Table function is roughly like this:

Select a bunch of arrays and structs (unnest them too) and rename them to populate the view, which is essentially a transformation of raw data,

From clause where we refer to the raw table,

Where clause where it filters out null values, but also should filter out profiles that are deleted.

What I came up with as below example: a window function to get the last operation type, and if the last operation type is DELETE, then it means profile is deleted (it is possible for profile to be inserted back again, thus we need to check it either very often, or every time there is a change to table, and also because profiles are updated quite frequently). Which brings to my mind what to create in order to be able to refer to these deleted profile values inside the where clause of the table function. Is it possible to use a materialized view for the below query and refer to it in the above table function at the where clause? I know window function is not supported now but is there another way? Should we just populate a table with a scheduled query using below, but I am afraid we will have problems with latency of updates of the schedules. Or should I just use the below query inside the where function as a nested query? 

 

SELECT profile_id,profile_operation_type from (

SELECT profile_id,profile_operation_type, ROW_NUMBER() OVER (PARTITION BY profile_id ORDER BY profile_operation_type, profile_updated_at  ASC) as rn
FROM `RAW_TABLE` )

where rn = 1 and profile_operation_type = 'DELETE'
Solved Solved
0 2 897
1 ACCEPTED SOLUTION

Based on your requirements and concerns, I see a few potential solutions.

  1. Use a subquery: You could simply include the query you provided as a subquery in the WHERE clause of your table function. This would ensure that the data is always up-to-date, but depending on the size of your RAW_TABLE and the complexity of the rest of your table function, this could be computationally expensive and slow. Here's how this might look:

SELECT
-- Your arrays, structs, etc.
FROM
`RAW_TABLE`
WHERE
-- Your existing conditions AND
profile_id NOT IN (
SELECT profile_id
FROM (
SELECT profile_id, profile_operation_type, ROW_NUMBER() OVER (PARTITION BY profile_id ORDER BY profile_operation_type, profile_updated_at ASC) as rn
FROM `RAW_TABLE`
)
WHERE rn = 1 AND profile_operation_type = 'DELETE'

)

  1. Temporary Table: You could create a temporary table with your deletion check query and refer to that table in your table function. This would help reduce the computational load on your table function, but you would need to manage updating the temporary table yourself to ensure it remains accurate. Scheduled queries could be used for this, but as you mentioned, you might run into issues with update latency.

CREATE OR REPLACE TEMPORARY TABLE temp_deleted_profiles AS
SELECT profile_id, profile_operation_type
FROM (
SELECT profile_id, profile_operation_type, ROW_NUMBER() OVER (PARTITION BY profile_id ORDER BY profile_operation_type, profile_updated_at ASC) as rn
FROM `RAW_TABLE`
)
WHERE rn = 1 and profile_operation_type = 'DELETE';

Then refer to this temporary table in your main table function:

SELECT
-- Your arrays, structs, etc.
FROM
`RAW_TABLE`
WHERE
-- Your existing conditions AND
profile_id NOT IN (SELECT profile_id FROM temp_deleted_profiles)

  1. Materialized View: As you pointed out, BigQuery does not support window functions in materialized views​​. However, if you can find a way to rewrite your query without window functions, a materialized view could be a good option because it would provide a balance between data freshness and query performance.

Each of these options has its pros and cons, and the best choice depends on your specific use case, requirements, and constraints. Consider factors such as the size and update frequency of your RAW_TABLE, your performance needs, and how up-to-date you need your data to be.

View solution in original post

2 REPLIES 2

Based on your requirements and concerns, I see a few potential solutions.

  1. Use a subquery: You could simply include the query you provided as a subquery in the WHERE clause of your table function. This would ensure that the data is always up-to-date, but depending on the size of your RAW_TABLE and the complexity of the rest of your table function, this could be computationally expensive and slow. Here's how this might look:

SELECT
-- Your arrays, structs, etc.
FROM
`RAW_TABLE`
WHERE
-- Your existing conditions AND
profile_id NOT IN (
SELECT profile_id
FROM (
SELECT profile_id, profile_operation_type, ROW_NUMBER() OVER (PARTITION BY profile_id ORDER BY profile_operation_type, profile_updated_at ASC) as rn
FROM `RAW_TABLE`
)
WHERE rn = 1 AND profile_operation_type = 'DELETE'

)

  1. Temporary Table: You could create a temporary table with your deletion check query and refer to that table in your table function. This would help reduce the computational load on your table function, but you would need to manage updating the temporary table yourself to ensure it remains accurate. Scheduled queries could be used for this, but as you mentioned, you might run into issues with update latency.

CREATE OR REPLACE TEMPORARY TABLE temp_deleted_profiles AS
SELECT profile_id, profile_operation_type
FROM (
SELECT profile_id, profile_operation_type, ROW_NUMBER() OVER (PARTITION BY profile_id ORDER BY profile_operation_type, profile_updated_at ASC) as rn
FROM `RAW_TABLE`
)
WHERE rn = 1 and profile_operation_type = 'DELETE';

Then refer to this temporary table in your main table function:

SELECT
-- Your arrays, structs, etc.
FROM
`RAW_TABLE`
WHERE
-- Your existing conditions AND
profile_id NOT IN (SELECT profile_id FROM temp_deleted_profiles)

  1. Materialized View: As you pointed out, BigQuery does not support window functions in materialized views​​. However, if you can find a way to rewrite your query without window functions, a materialized view could be a good option because it would provide a balance between data freshness and query performance.

Each of these options has its pros and cons, and the best choice depends on your specific use case, requirements, and constraints. Consider factors such as the size and update frequency of your RAW_TABLE, your performance needs, and how up-to-date you need your data to be.

Hi,

Thanks a lot for guidance.  After some more time going into documentation, I believe we can set non-incremental mat views with some data staleness that we can tolerate. I came up with this:

 

CREATE MATERIALIZED VIEW transform.test_mv OPTIONS (
  enable_refresh = true, refresh_interval_minutes = 1,
  max_staleness = INTERVAL "4" HOUR,
    allow_non_incremental_definition = true)
   
AS(

SELECT profile_id,profile_operation_type from (
SELECT profile_id,profile_operation_type, ROW_NUMBER() OVER (PARTITION BY profile_id ORDER BY profile_operation_type, profile_updated_at  ASC) as rn
FROM `data-factory-new-dev.raw.profile_data` )

where rn = 1 and profile_operation_type = 'DELETE')