Nothing Special   »   [go: up one dir, main page]

Medallion Architecture v1

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 7

Medallion Architecture

Jun 2024

© 2024 Cognizant
Introduction
• A medallion architecture is a data design pattern used to logically organize data in a
Lakehouse, with the goal of incrementally and progressively improving the structure and
quality of data as it flows through each layer of the architecture.

• Medallion architectures are sometimes also referred to as "multi-hop" architectures

• This architecture guarantees atomicity, consistency, isolation, and durability as data


passes through multiple layers of validations and transformations before being stored in
a layout optimized for efficient analytics. The terms bronze (raw), silver (validated), and
gold (enriched) describe the quality of the data in each of these layers.

• Databricks coined the term "medallion architecture". Databricks recommends taking a


multi-layered approach to building a single source of truth for enterprise data products.
Data Layers: Bronze, Silver and Gold

5971308
Logical Architecture (Overall View)
Lake Ingestion Tier Lake Model & Transform Tier Serving Tier

Ingestion Platform Federated Platforms

Databases
Real-time Analytics
Global/Sector-Based
Metadata Driven Global/Sector-Based
Synapse SQL
Ingestion Environment SparkSQL Environment
APIs
Analytical
Applications

Centralized Presto
Store Store (today)
(future – decentralized)
Machine Learning

Files Raw/ Standardized


Landing Delta Format Silver Gold
Delta Format Delta Format Machine Learning

Trusted Data Governance


Enterprise Data Catalog Data Sharing

Data Map = Data Assets | Classifications | Business Context

Supporting Services Security & Access Monitoring Dev/Ops

Azure AD Key Vault Monitor DevOps 4


Advantages
• Simple data model
• Logical progression of data cleanliness
• Allows you to recreate any downstream tables from raw
sources

Disadvantages
• Uses large amounts of storage
• Often requires additional downstream processing
• Implies a data Lakehouse architecture

Medallion architecture and data mesh


• The Medallion architecture is compatible with the concept of a data mesh. Bronze and silver tables
can be joined together in a "one-to-many" fashion, meaning that the data in a single upstream table
could be used to generate multiple downstream tables.
Logical Architecture (Security View – System/User Access)

Primary Data Access Platforms

Store

Raw/ Standardized Silver Gold


Landing Delta Format Delta Format Delta Format

Security Persona Description

Developers/
Leverage Azure to build, integrate, and manage data and analytics products. Create AI enabled applications / solutions when applicable.
Data Engineers

Data Analysts Leverage Azure to discover and share new insights from existing data assets or ad hoc data.
Data Scientists Use your preferred tools and machine learning frameworks to build scalable data science solutions. Accelerate end-to-end ML lifecycle.
Data Stewards Responsible for managing access to data assets. Authorization control – approve access to data sets or delegate that approval accordingly.

6
Thank you

You might also like