Recomendation: JSON vs repeated key-value records ...

mdomke · 02-14-2024 09:28 AM

I'm in the process of designing a data warehouse for BI purposes, where I want to ingest events from different applications. All events have a number of common fields to identify its source, the associated user and client and so on. The actual event data is however dependent on the specific event type. I'm trying to understand what is the best approach to design a schema for this purpose. I saw that the Google Analytics Export schema for BigQuery uses a repeated record with key and value (where the value is a struct itself with fields for different value types, e.g. int, float, string). Another obvious solution would be to use a JSON field for the unstructured data. Can somebody explain to me what are the pros and cons for either approach? Is the key-value-solution more efficient when it comes to creating queries on the data? Is the JSON field the more modern approach and isn't used in the Google Analytics usecase because it is a type that was added more recently? Any guidance would be greatly appreciated!

BilelBS

Hi @mdomke

did you get some news about your post, if yes would you share it please ?

Also, i am facing an issue that i posted here, if you can help it would be greatly appriciated!

https://www.googlecloudcommunity.com/gc/Data-Analytics/Transform-RawData-Bq-table-to-Json-Bq-table/m...

Thanks for your actions.

Recomendation: JSON vs repeated key-value records for unstructured data