Auto-Switch between Dev & Prod Schema and PDTs - Page 2

matthiasa · 11-12-2020 01:35 AM

I would like to share an approach to auto-switch between development and production datasets on a single looker instance that still works with persistent derived tables. Why is this relevant? Having separate environments is crucial for a controlled development process that does not interfere with production. And while you can create PDTs already downstream with dbt, you may want to use aggregate awareness (and PDTs) to dynamically speed up queries.

This post describes a very elegant approach to auto-switch between development and production datasets by using ‘if dev’ and user attributes in your views:

view: your_view {

  sql_table_name:
  -- if dev -- {{_user_attributes['development_schema']}}.your_table
  -- if prod -- analytics.your_table
  ;;

When in production mode, the production schema analytics is queried. When in development mode, the developer’s development schema is queried. This way each developer can work on features in an isolated dataset, not interfering with production or other developers. This is great!

The problem is this does not seem to work with PDTs and therefore not with aggregate awareness. Trying to build an aggregate table on the view above will result in the following warning:

Persistent Native Derived Table for view "your_explore::your_view" references user attributes. Persisting this table may result in unexpected behavior.

It’s stated in the docs that liquid and user attributes won’t work with PDTs.

So we have to take another approach when we want to use aggregate awareness and still keep environments separate. Instead of referencing the user attribute in the view we can do this in the database connection settings:

This means if no dataset is provided as part of the table reference, the development schema maintained in the user attribute is used (here dev_matthias) and the production schema analytics is configured as PDT override. In the view file we can simple remove the user attributes but still point to our production dataset when in production mode:

view: your_view {

  sql_table_name:
  -- if dev -- your_table
  -- if prod -- your_production_schema.your_table
  ;;

Now we can auto-switch between development and production and user aggregate awareness.

But also this approach comes with a drawback, at least when you are on BigQuery: while in development mode the dataset won’t be included in the table references of the generated SQL anymore. So why is this a problem? My preferred workflow for debugging queries was to copy the generated SQL over to the BigQuery UI since you get error messages highlighted and can use auto-completion, features which do not seem to be available in SQL runner. But as we cannot set default datasets in the BigQuery UI, it is not possible to copy the code over without manually adding the table references.

What is your approach to separate environments (on a single looker instance)?