Creating couple columns from a single nested field

Hi,

We store event log properties in nested fields, as shown below:

 

 

dimension: message_number {
    type: number
    view_label: "Event Properties"
    sql: (
                                select
                                 logs__properties__value.int_value from
                                  UNNEST(${TABLE}.properties) logs__properties
                                  LEFT JOIN UNNEST([logs__properties.value]) logs__properties__value where  lower(logs__properties.key) = 'message_number') ;;
  }
dimension: character_count {
    type: number
    view_label: "Event Properties"
    sql: (
                        select
                         logs__properties__value.int_value from
                          UNNEST(${TABLE}.properties) logs__properties
                          LEFT JOIN UNNEST([logs__properties.value]) logs__properties__value where  lower(logs__properties.key) = 'character_count') ;;
  }

 

 

The problem is that when we create measures using these dimensions, they generate multiple subqueries, which are inefficient in BigQuery.

Since event properties are dynamic, is there a way to UNNEST only once while still allowing access to multiple properties efficiently?

Thank you,

Berat S.

0 1 68
1 REPLY 1

Hello Berat,

I would likely need to do some testing/trial an error with your data and use-case to understand the pros-and-cons of some alternate approaches...  But at a high level...

1) Please check this related article how to model nested bigquery data in Looker. With this approach, you handle repeated structs as a separate view, and join it with unnest in the explore definition and a one_to_many relationship.   In my experience, this approach is pretty intuitive and should definitely be used if you expect to use various details from the unnested rows..

2) Counterpoint... often the goal is to extract info about the parent record, and unnesting in the way described in #1 causes data from the parent record to be repeated...  In that case I have found that doing field-level subqueries produces the desired results and is fast/efficient...  So, I find that doing something similar to what you are doing is actually more practical and efficient.
(Note that, in my experience, often in this approach you should be applying an aggregation - e.g. something like any_value(logs__properties__value.int_value having MAX case when lower(logs__properties.key) = 'character_count'  then 1 else null end) - in the subquery to efficiently ensure you produce only a single dimensional value for each parent record... though, if your queries are not failing, it must be the case that your subquery is already only producing a single value for each parent record). 

So.. what to do?
In my opinion, without looking at your data, if what you have shown is producing the dimensional values as needed for the related parent records I don't know that there is much improvement that can be achieved, if you continue doing this flattening at query time.  But what WOULD help is to explicitly extract these relevant characteristics of the parent into regular parent-level columns in advance of the end-user's query.  This would involve using similar logic to what you have written in an ETL step (or Looker Persistent Derived Table).  

I hope this helps!

Top Labels in this Space
Top Solution Authors