Need a way to incrementally build a PDT that's using create_process

Hi Everyone,

We have a PDT that is built using create_process and contains a few sql_steps. This PDT is becoming large and very time-consuming to build fully every day.

We want to build it incrementally but that is not possible to do as mentioned in the docs: https://cloud.google.com/looker/docs/reference/param-view-create-process#tables_defined_with_create_...

We thought we might be able to make a custom incremental build but it's proving very challenging. Anyone have a suggestions, or have experienced this problem before?

Thanks

0 3 89
3 REPLIES 3

Hello kt_grist,

Yes, you should be able to do your own incremental builds to the PDT with create_process.

The equivalent of looker's incremental PDT management is that it deletes the last increment_offset entries in the increment_key field, then merges in results filtered since that earliest increment_key value that was deleted. You should be able to do something similar:
step 1 deletes recent data (to ensure no duplication)
step 2 add data (filtered to recent records only in source query)

Side note: The date fields used for filtering should be partitioned or indexed so that these queries able to minimize data scanned to only relevant data, else there's likely no benefit to doing this incremental step (i.e. your step 2 will still be long sing you are scanning all data in source)

I hope this helps! 

-Kevin

Thanks for your reply. We have tried the approach you suggested but the issue we are facing is that we are unable to reference the same table within the PDT definition. 

create_process requires we create the table referenced as ${SQL_TABLE_NAME} which creates a whole new table. When I have tried to reference the old version of the table using ${pdt_name.SQL_TABLE_NAME} I get an error. Do you know a way around this?

kt_grist_0-1746800552140.png

 

Hello kt_grist,

You are right it is a complicating factor that Looker is specifically expecting you to define creation of a new table at ${SQL_TABLE_NAME}.  Below I share an approach that appears to work around that difference between what Looker create_process is expecting vs your goal (avoid building an entire table).  I do ponder if this type of ETL process might also be well managed external to Looker PDT process (maybe so, if for example the PDT definition is static and doesn’t need to be controlled by LookML developers). 

That considered, I hope this approach helps if it is needed:

---

datagroup: example_datagroup_daily_build_at_midnight {sql_trigger:select date_trunc(current_timestamp(), DAY);;}

#Note: I did an initial build of the target table referenced below manually (test_incremental_pdt in the PDT dataset). This works incrementally from there on an existing table
view: create_process_for_custom_incremental_PDT_test {
derived_table: {
datagroup_trigger: example_datagroup_daily_build_at_midnight
create_process: {
# FOR INCREMENTAL LOAD: FIRST DELETE ANY DATA THAT ALREADY EXISTS FOR PERIOD WE PLAN TO INCREMENTALLY RELOAD
# HOW FAR BACK TO LOAD(AND THUS DELETE IN THIS STEP) MAY DEPEND ON HOW LATE DATA COULD ARRIVE, AND SHOULD COVER PERIOD SINCE PRIOR TRIGGER EVENT
#. MUST BE PARTITIONINIG ON THIS DATE FIELD ELSE THIS WILL SCAN WHOLE TABLE
sql_step:
DELETE FROM `test_incremental_pdt`
WHERE a_date >= date_add(CURRENT_DATE(),INTERVAL -1 DAY);;

# LOAD DATA FROM SOURCE INTO TARGET TABLE
# REPLACE WITH YOUR ACTUAL DERIVED TABLE LOGIC IN THE INNER QUERY
# SHOULD HAVE A WHER CLAUSE FILTERING ON SAME PERIOD OF TIME THAT WAS DELETED IN PRIOR STEP
sql_step:INSERT INTO `test_incremental_pdt` (
SELECT a_date ,1 as value from (select CURRENT_DATE() as a_date ) as dummy_table
WHERE a_date >= date_add(CURRENT_DATE(),INTERVAL -1 DAY);;
);;
# LOOKER EXPECTS YOU TO DEFINE A WHOLE NEW TABLE AND LAND IT IN A DYNAMICALLY GENERATED LOCATION.
# SO THAT SUBSEQUENT LOOKER REFERENCES IN EXPLORE CAN USE THE DYNAMIC NAME WE"LL GENERATE... WE'LL LAND A VIEW THERE INSTEAD (WHICH ALWAYS POINTS TO OUR PERMANENT TABLE)
sql_step:CREATE OR REPLACE VIEW ${SQL_TABLE_NAME} as (select * from `kitchenTablePDTs.test_incremental_pdt`);;
}
}
#dimensions corresponding to my Proof Of Concept
dimension: a_date {type:date}
dimension: value {}

}
explore: create_process_for_custom_incremental_PDT_test {}

 

 

Top Labels in this Space