So I am exploring of fully using BQ as my primary storage for Data Lakehouse architecture pattern.
However, one of the main features of the Data Lakehouse is that its first layer (raw, bronze - whatever it's called) is schema on-read.
Are there any approaches where I could use BQ for my RAW with a schema on-read approach?
Has anyone seen / done this?
For example, I am loading data from a RDBMS (mssql, oracle) via a BQ connector, and even if a column changes the data type or a new column is added or column is removed - all works and data is ingested just fine in BQ. Meaning at this RAW stage I don't have to worry about managing schema evolution.
Thank you, Vaiva!
Loading data from a RDBMS into BigQuery using a custom schema file:
Streaming data from a RDBMS to BigQuery:
@ms4446 thank you for your answer.
In its true sense schema on-read storage would mean that I can load & store whatever, ideally a data file + metadata file. However, if I set-up BQ table with auto-detect schema, or schema file, the next time the data is written to BQ, the schema will be checked and some records with be exceptions. It means, we have schema validation going on. Meaning not really a schema on read.
One idea was if we load all columns as string, then schema validation would always pass, but the storage might not be optimal.