Mastering Your Data with Medallion Architecture: The Three-Layer Design for Data Management
In my previous article, Benchmarking database architectures : Data Warehouse, Data Lake and Data Lakehouse, I compare that three Databases architecture.
Data is the backbone of any organization, and properly organizing and managing it is critical to ensuring its practical use. One way to organize and manage data is to use a Data Lakehouse architecture.
The objective of this article is to focus on the Data Lakehouse architecture in more details through one of its design patterns, Medallion Architecture and to show how it fit to achieve the current state of the art, especially in the context of data processing approaches.
Medallion Architecture is one of Data Lakehouse design patterns. When deployed, it allows for simple data flow through specific Data Lakehouse layers. With each layer, data and its structure is augmented, enhanced, cleaned and aggregated to finally present end-users with high quality data products that may be used for Business Intelligence reporting and Machine Learning.
A medallion architecture consists of three layers: Bronze, Silver and Gold. Data flows from one layer to the next, gradually moving from raw, unstructured data to high-quality, refined data that is ready to be used.
Let’s take a closer look at each layer :