When it comes to Data Lakehouse Architecture and promoting data from Bronze (raw) to Gold (Curated), what are actual practices you have seen?
Silver (Enriched): what data transformations happen in silver? Is the data model kept close to the source data model or remodeled to present enterprise data entities? Silver zone is one where I found most difference in definition across different publications.
Data Persistence: where/when data should be persisted and where views/cached views should be used? Bronze(raw), Silver(Enriched), Gold(Curated). Let's assume all zones are represented in BQ.
Thank you for sharing your thoughts.
Just example, of Silver Definition from Databricks: "In the Silver layer of the lakehouse, the data from the Bronze layer is matched, merged, conformed and cleansed ("just-enough") so that the Silver layer can provide an "Enterprise view" of all its key business entities, concepts and transactions. (e.g. master customers, stores, non-duplicated transactions and cross-reference tables)." But when reading hands-on content (blogs and so on) - there silver is does not work on "enterprise view".
Google's Building the analytics lakehouse on Google Cloud whitepaper uses RAW/Enriched/Curated Zone approach, but Silver one is not specifically defined, apart from "In the Enriched Zone, schema is well enforced, data governance and quality rules have been applied on this data (e.g., sensitive data is anonymized), and data is cleansed and optimized for most common consumption patterns"