When you come across JSON objects in Postgres and Snowflake, the obvious thing to do is to use a JSON parsing function to select JSON keys as LookML dimensions. You’ll be able to create a lot of dimensions without any issues, but there are some nuances to note. This article will outline the process of defining dimensions and iron out some issues you may encounter in the process.
Say there is a dimension defined as the following:
Postgres or Snowflake:
- dimension: test_json_object
sql: ${TABLE}.test_json_object
If you select this dimension in an explore, you’ll get something like this:
{"key1":"abc","key2":"a44g6jX3","key3":"12345","key4":"2015-01-01 12:33:24"}
Now we know that our new dimensions are keys 1~4, with various data types ranging from string values to a timestamp.
To define key2 in the above-described example, you would write the following:
Postgres:
- dimension: key2
sql: json_extract_path(${TABLE}.test_json_object, 'key2')
Snowflake:
- dimension: key2
sql: ${TABLE}.test_json_object:key2
This will return a string value by default, still in quotes:
"a44g6jX3"
Now that we have a baseline list of dimensions defined, we’ll explicitly cast these dimensions as appropriate data types in the SQL parameter:
Postgres:
- dimension: key2
sql: CAST(JSON_EXTRACT_PATH(${TABLE}.test_json_object, 'key2') AS string)
Snowflake:
- dimension: key2
sql: ${TABLE}.test_json_object:key2::string
This will now result in quotes being removed:
a44g6jX3
Simply declaring a LookML dimension type (string, number, etc.) may NOT remove the quotes (specifically in Snowflake). Even worse, if you have an integer dimension defined as the following (type declared, but not explicitly casted)…
Snowflake:
- dimension: key3
type: number
sql: ${TABLE}.test_json_object:key3
… you risk obtaining nulls for that dimension.
As such, explicitly casting data types at this stage is crucial.
This is an extension of step 3. LookML offers native timestamp conversion with dimension_group
and type: time
declaration:
Postgres and Snowflake:
- dimension_group: test
type: time
timeframes: []
While this might work for date
s in Snowflake, you will most likely see errors cropping up when you try to select Time, Month, Year, etc.
Instead of assuming accurate timestamp handling by Looker, you should be explicitly casting your newly-defined dimensions, in this case as the following:
Postgres:
- dimension_group: new_time
type: time
timeframes: [time, date, week, month, year]
sql: CAST(JSON_EXTRACT_PATH(json_extract_path(${TABLE}.test_json_object, 'key4') AS timestamp)
Snowflake:
- dimension_group: new_time
type: time
timeframes: [time, date, week, month, year]
sql: ${TABLE}.test_json_object:key4::timestamp