Data Mesh in Action
()
About this ebook
In Data Mesh in Action you will learn how to:
Implement a data mesh in your organization
Turn data into a data product
Move from your current data architecture to a data mesh
Identify data domains, and decompose an organization into smaller, manageable domains
Set up the central governance and local governance levels over data
Balance responsibilities between the two levels of governance
Establish a platform that allows efficient connection of distributed data products and automated governance
Data Mesh in Action reveals how this groundbreaking architecture looks for both small startups and large enterprises. You won’t need any new technology—this book shows you how to start implementing a data mesh with flexible processes and organizational change. You’ll explore both an extended case study and multiple real-world examples. As you go, you’ll be expertly guided through discussions around Socio-Technical Architecture and Domain-Driven Design with the goal of building a sleek data-as-a-product system. Plus, dozens of workshop techniques for both in-person and remote meetings help you onboard colleagues and drive a successful transition.
About the technology
Business increasingly relies on efficiently storing and accessing large volumes of data. The data mesh is a new way to decentralize data management that radically improves security and discoverability. A well-designed data mesh simplifies self-service data consumption and reduces the bottlenecks created by monolithic data architectures.
About the book
Data Mesh in Action teaches you pragmatic ways to decentralize your data and organize it into an effective data mesh. You’ll start by building a minimum viable data product, which you’ll expand into a self-service data platform, chapter-by-chapter. You’ll love the book’s unique “sliders” that adjust the mesh to meet your specific needs. You’ll also learn processes and leadership techniques that will change the way you and your colleagues think about data.
What's inside
Decompose an organization into manageable domains
Turn data into a data product
Set up central and local governance levels
Build a fit-for-purpose data platform
Improve management, initiation, and support techniques
About the reader
For data professionals. Requires no specific programming stack or data platform.
About the author
Jacek Majchrzak is a hands-on lead data architect. Dr. Sven Balnojan manages data products and teams. Dr. Marian Siwiak is a data scientist and a management consultant for IT, scientific, and technical projects.
Table of Contents
PART 1 FOUNDATIONS
1 The what and why of the data mesh
2 Is a data mesh right for you?
3 Kickstart your data mesh MVP in a month
PART 2 THE FOUR PRINCIPLES IN PRACTICE
4 Domain ownership
5 Data as a product
6 Federated computational governance
7 The self-serve data platform
PART 3 INFRASTRUCTURE AND TECHNICAL ARCHITECTURE
8 Comparing self-serve data platforms
9 Solution architecture design
Jacek Majchrzak
Jacek Majchrzak is a hands-on lead architect in the area of drug discovery where he implements the data mesh idea. Jacek is a workshop facilitator with a strong focus on domain-driven design, software architecture and socio-technical systems design.
Related to Data Mesh in Action
Related ebooks
Modern Big Data Architectures: A Multi-Agent Systems Perspective Rating: 0 out of 5 stars0 ratingsData Mesh: Building Scalable, Resilient, and Decentralized Data Infrastructure for the Enterprise Part 1 Rating: 0 out of 5 stars0 ratingsData Mesh: What Is Data Mesh? Principles of Data Mesh Architecture Rating: 0 out of 5 stars0 ratingsData Lake Development with Big Data Rating: 0 out of 5 stars0 ratingsScala for Data Science Rating: 0 out of 5 stars0 ratingsGraph Databases in Action: Examples in Gremlin Rating: 0 out of 5 stars0 ratingsStreaming Data: Understanding the real-time pipeline Rating: 0 out of 5 stars0 ratingsNoSQL Essentials: Navigating the World of Non-Relational Databases Rating: 0 out of 5 stars0 ratingsMaking Sense of NoSQL: A guide for managers and the rest of us Rating: 0 out of 5 stars0 ratingsThe Microsoft Data Warehouse Toolkit: With SQL Server 2008 R2 and the Microsoft Business Intelligence Toolset Rating: 0 out of 5 stars0 ratingsSpring in Practice Rating: 0 out of 5 stars0 ratingsSpark for Data Science Rating: 0 out of 5 stars0 ratingsBig Data Modeling and Management Systems Rating: 0 out of 5 stars0 ratingsData Mesh: Transforming Data Architecture for Decentralized and Scalable Insights Rating: 0 out of 5 stars0 ratingsSolutions Architect's Handbook: Kick-start your career with architecture design principles, strategies, and generative AI techniques Rating: 0 out of 5 stars0 ratingsSix-Word Lessons for Data-Driven Decision-Making: 100 Lessons Today's Data Pros Must Adopt for Exceptional Bottom-Line Results Rating: 0 out of 5 stars0 ratingsA Simplified Approach to It Architecture with Bpmn: A Coherent Methodology for Modeling Every Level of the Enterprise Rating: 0 out of 5 stars0 ratingsHexagonal Architecture Explained Rating: 0 out of 5 stars0 ratingsEnterprise Architecture at Work: Modelling, Communication and Analysis Rating: 2 out of 5 stars2/5DevOps Handbook: What is DevOps, Why You Need it and How to Transform Your Business with DevOps Practices Rating: 4 out of 5 stars4/5Data warehouse Complete Self-Assessment Guide Rating: 4 out of 5 stars4/5The Data Model Resource Book, Volume 1: A Library of Universal Data Models for All Enterprises Rating: 0 out of 5 stars0 ratingsA Manager's Guide to Data Warehousing Rating: 2 out of 5 stars2/5Patterns, Principles, and Practices of Domain-Driven Design Rating: 0 out of 5 stars0 ratingsBuilding the Data Warehouse Rating: 5 out of 5 stars5/5The Autonomous Revolution: Reclaiming the Future We've Sold to Machines Rating: 0 out of 5 stars0 ratingsUML Summarized: Key Concepts and Diagrams for Software Engineers, Architects, and Designers Rating: 0 out of 5 stars0 ratings
Computers For You
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters Rating: 4 out of 5 stars4/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution Rating: 4 out of 5 stars4/5Elon Musk Rating: 4 out of 5 stars4/5How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally Rating: 4 out of 5 stars4/5Uncanny Valley: A Memoir Rating: 4 out of 5 stars4/5The Invisible Rainbow: A History of Electricity and Life Rating: 5 out of 5 stars5/5Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad Rating: 0 out of 5 stars0 ratingsCompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61 Rating: 0 out of 5 stars0 ratingsAlan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition Rating: 4 out of 5 stars4/5Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics Rating: 4 out of 5 stars4/5Deep Search: How to Explore the Internet More Effectively Rating: 5 out of 5 stars5/5The Professional Voiceover Handbook: Voiceover training, #1 Rating: 5 out of 5 stars5/5The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 4 out of 5 stars4/5Dark Aeon: Transhumanism and the War Against Humanity Rating: 5 out of 5 stars5/5Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls Rating: 4 out of 5 stars4/5Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Rating: 4 out of 5 stars4/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 5 out of 5 stars5/5Remote/WebCam Notarization : Basic Understanding Rating: 3 out of 5 stars3/5The Hacker Crackdown: Law and Disorder on the Electronic Frontier Rating: 4 out of 5 stars4/5CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide Rating: 5 out of 5 stars5/5Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5
Reviews for Data Mesh in Action
0 ratings0 reviews
Book preview
Data Mesh in Action - Jacek Majchrzak
inside front cover
Data mesh development elements—data product development cycle details
Data Mesh in Action
Jacek Majchrzak, Sven Balnojan, and Marian Siwiak, with Mariusz Sieraczkiewicz
Foreword by Jean-Georges Perrin
To comment go to liveBook
Manning
Shelter Island
For more information on this and other Manning titles go to
www.manning.com
Copyright
For online information and ordering of these and other Manning books, please visit www.manning.com. The publisher offers discounts on these books when ordered in quantity.
For more information, please contact
Special Sales Department
Manning Publications Co.
20 Baldwin Road
PO Box 761
Shelter Island, NY 11964
Email: orders@manning.com
©2023 by Manning Publications Co. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps.
♾ Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine.
ISBN: 9781633439979
brief contents
Part 1. Foundations
1 The what and why of the data mesh
2 Is a data mesh right for you?
3 Kickstart your data mesh MVP in a month
Part 2. The four principles in practice
4 Domain ownership
5 Data as a product
6 Federated computational governance
7 The self-serve data platform
Part 3. Infrastructure and technical architecture
8 Comparing self-serve data platforms
9 Solution architecture design
Appendix A.
Appendix B.
Appendix C.
Appendix D.
contents
Front matter
foreword
preface
acknowledgments
about this book
about the authors
about the cover illustration
Part 1. Foundations
1 The what and why of the data mesh
1.1 Data mesh
1.2 Why the data mesh?
Alternatives
Data warehouses and data lakes inside the data mesh
Data mesh benefits
1.3 Use case: A snow-shoveling business
1.4 Data mesh principles
Domain-oriented decentralized data ownership and architecture
Data as a product
Federated computational governance
Self-serve data infrastructure as a platform
1.5 Back to snow shoveling
1.6 Socio-technical architecture
Conway’s law
Team topologies
Cognitive load
1.7 Data mesh challenges
Technological challenges
Data management challenges
Organizational challenges
2 Is a data mesh right for you?
2.1 Analyzing data mesh drivers
Business drivers
Organizational drivers
Domain-data drivers
Minor organizational drivers
Is a data mesh a good fit for me?
2.2 Data mesh alternatives and complementary solutions
Enterprise data warehouse
Data lake
Data lakehouse
Data fabric
Data mesh vs. the rest of the world
2.3 Understanding a data mesh implementation effort
The data mesh development cycle
Development cycle in the shoveling example
Enabling the team
Development cycle in detail
3 Kickstart your data mesh MVP in a month
3.1 Getting the lay of the land
Drawing a system landscape diagram
Performing stakeholder analysis
3.2 Identifying candidates for the MVP implementation team
Choosing development teams
Choosing the cooperation model
Choosing a data governance team
3.3 Setting up MVP governance
Defining data mesh value statement(s)
Defining data governance policies
Federating data governance
3.4 Developing minimal data products
Identifying domain-oriented datasets
Choosing data product owners
Deciding on the minimum viable data product description
Developing the simplest tools to expose your data
3.5 Setting up the minimal platform
Ensuring platform-forced governability
Ensuring platform security
Part 2. The four principles in practice
4 Domain ownership
4.1 Capturing and analyzing domains
Domain-driven design 101
Invite the right people
Choose the correct workshop technique
4.2 Applying ownership using domain decomposition
Domain, subdomain, and business capability
Decompose domains using business capability modeling
How are domains and business capabilities related to data?
Assign responsibilities to the data-product-owning team
Choose the right team to own data
4.3 Applying ownership using data use cases
Data use cases
Model and bounded context
Set up boundaries of use-case-driven data products
Choose the right team to own data
4.4 Applying ownership using design heuristics
What is a heuristic?
Using design heuristics
Designing heuristics and possible boundaries
4.5 Final landscape: The mesh of interconnected data products
Messflix data mesh
Data products form a mesh
Is it already a data mesh?
5 Data as a product
5.1 Applying product thinking
Product thinking analysis
Data product canvas
5.2 What is a data product?
Data product definition
Product, not project
What can be a data product?
5.3 Data product ownership
Data product owner
Data product owner responsibilities
An Agile DevOps team as a base for data product dev team
Data product owner and product owner
5.4 Conceptual architecture of a data product
External architecture view
Internal architecture view
5.5 Data product fundamental characteristics
Self-described data product
Introduction to metadata
Metadata as code
Data product metadata
Domain dataset metadata
Other kinds of metadata
5.6 Additional data product characteristics: FAIR and immutability
Findability
Accessibility
Interoperable
Reusable
Immutable
5.7 Data contracts and sharing agreements inside the data mesh
Data contracts and sharing agreements
Implementing data contracts and sharing agreements
6 Federated computational governance
6.1 Data governance in a nutshell
6.2 Benefits of data governance
Business value perspective
Data usability perspective
Data control perspective
6.3 Planning data governance outcomes
Hierarchy of data governance outcomes
Strategic-level outcomes
Tactical-level outcomes
Implementation-level outcomes
6.4 Federating data governance
Thinking of data governance in terms of sliders
Extreme ends of data governance models
Federated data governance model
Setting-up governance team operations
6.5 Making data governance computational
Making policies computational
Automating policy checks
7 The self-serve data platform
7.1 The MVP platform
Platform definition
Platform thinking
7.2 Improvements with X as a service
X as a service explained
X as a service applied
7.3 Improvements with platform architecture
Platform architecture explained
Platform architecture applied
7.4 Improvements for the data producers
Part 3. Infrastructure and technical architecture
8 Comparing self-serve data platforms
8.1 Data mesh on Google Cloud Platform
Self-serve data platform architecture
Identifying the components of the platform
Identifying the components of the data product
Workflows
Variations
Relation to data mesh ideas
GCP architecture summary
8.2 Data mesh on AWS
Self-serve data platform architecture
Identifying the components of the platform
Identifying the components of the data products
Workflows
Relation to data mesh ideas
Variations
AWS architecture summary
8.3 Data mesh on Databricks
Self-serve data platform architecture
Identifying the components of the platform
Identifying the components of the data product
Workflow considerations
Variations
Databricks architecture summary
8.4 Data mesh on Kafka
Self-serve data platform architecture
Identifying the components
Considerations
Kafka architecture summary
9 Solution architecture design
9.1 Capturing and understanding the current state
What is software architecture?
How to document architecture: The C4 model
9.2 Understanding architectural drivers of a data product design
Architectural drivers
Capturing architectural drivers for a data-product design
9.3 Designing the future architecture of a data product and related systems
Design session
File-based data product: Spreadsheet
From monolith and microservice to a data product
Exposing data for stream processing and batch processing
Appendix A.
Appendix B.
Appendix C.
Appendix D.
index
front matter
foreword
The data mesh is to data as agile is to software engineering, or as microservices are to architecture patterns. It will be an essential component of your future data strategy. Data Mesh in Action addresses both the technology of the data mesh and the methodology your organization can follow to implement it.
This book teleports you into the seat of the chief architect on a data mesh project. The authors will coach you through the chaotic process of your first data product. As you gain more and more of those components, your mesh will build itself. The authors’ collective experience drives this transformation. Your responsibility will be to pick, choose, and adapt this framework to your needs and organization.
The data mesh is based on four key principles: domain ownership, data as a product, federated computational governance, and self-serve data platform. The book details organizational impact of these principles, as well as their technology, in great length. Individually, all those principles are well-known to engineers and architects; the real (r)evolution of the data mesh is its ability to combine them and deliver a global approach to building modern data platforms.
In my more than 15 years of building hybrid data platforms, I have always been missing something. Whether it was due to the strict approach of ingesting data in a warehouse or the lack of governance of a lake, to name two popular patterns, there was always this feeling of it ain’t gonna work.
The mesh is different. It does not focus solely on technology; it puts governance and quality at the center and allocates ownership to the real owner, not some central commanding and demanding group. As a result, with adequate self-service tools, the data mesh will liberate the forces of innovation in your organization. And that is what this book will help you achieve.
—Jean-Georges Perrin,
Intelligence platform lead at PayPal,
president and cofounder of AIDAUG,
and Lifetime IBM Champion
preface
Each one of us authors has experienced—at length and at different companies—the old way of doing data,
usually through centralized data lakes and data warehouses in combination with a set of central teams organized inside an analytics function. The old way basically looked like this:
Multiple decentralized development teams have data that is accessible through storage systems like a shared drive, a decentralized database, a Representational State Transfer (REST) API, or any other interface.
One or more centralized data teams are tasked with collecting this data into one monolithic pot. This is either a data lake or a data warehouse.
The same set of teams is tasked with transforming this data into something useful.
Multiple decentralized analysts, development teams, or machine learning (ML) teams pick up that transformed data and convert it into value in the form of reports, recommendation systems, or anything else they can think of.
We learned the hard way that this concept has its limits, producing a bottleneck in terms of both technology and team capacities. We all saw companies struggling to get the flow from data to value to be as productive as the companies needed it to be. Then the data mesh and the ideas behind it appeared on the horizon.
The data mesh is a decentralization paradigm. It decentralizes the ownership of data, its transformation into information, and its serving. It aims to increase the value extraction from data by removing bottlenecks in the data value stream by these means.
The concept of the data mesh appeared on the stage in 2019 and has since lit not just the data world, but the whole technology world, on fire. The data mesh concept breaks with the current world of data, which usually treats data as a by-product of software components. This new approach turns the spotlight on data producers and gives them the responsibility to handle the data just as they would handle their software.
With this, the data mesh takes the same journey software components have taken, with microservices architectures and with the DevOps movement. It takes the same journey frontends are currently taking with microfrontends. And just as in these examples, we believe that the data mesh is the right approach to finally gain the flexibility to extract value from our data at scale, be that in business intelligence (BI), ML learning, or any other use case you can think of.
The data mesh concept is often referred to as a socio-technical paradigm shift: its core is not about technology but about the alignment of people, processes, and organizations. This significant complexity is why we wrote this book. However, we don’t just present the available theoretical knowledge that is out there; we focus on parts of the data mesh that are, in our experience, critical for successful implementation. We have organized those parts into a digestible resource to help you put a data mesh in action!
To guide you through the process, we’ve prepared hands-on examples with a lot of architecture sketches, describing various technologies, workshop techniques, team organization forms, and the like. After reading this book, you should be able to do the following:
Evaluate whether a data mesh will suit your organization’s business needs
Lay the groundwork for data mesh development
Develop a minimal data mesh to start your journey
Keep iteratively developing and expanding your data mesh
Don’t expect to find a lot of code in this book, other than a little JavaScript Object Notation (JSON) here and there. That’s because we truly believe the magic is not in the technology, but in the people, processes, and organizations. But, of course, you can expect to find a lot of technology inside this book in the form of deep architecture sketches with reference to various technologies and cloud providers, explanations, and blueprints inspired by multiple real-world examples.
That said, we don’t believe in a black-and-white implementation of the data mesh idea. This book will help you adjust the data mesh idea to your company by offering a lot of degrees of freedom, shortcuts, and a healthy level of pragmatism.
To tie together our experience, we will use an imaginary company called Messflix LLC, which resembles a lot of what we’ve seen out there in the data world. This company will be our go-to example as we go through the mess-to-mesh
journey; however, since we also focus on making the data mesh adaptable to many types of companies, not just one, this is not the only example we utilize throughout the book. Later in this front matter, we provide a brief introduction to Messflix by taking a look at the data mess the company has gotten itself into.
acknowledgments
First, we would like to express our gratitude to the community engaged with data mesh development. Their discussions and openness about problems and challenges helped us broaden our perspectives and put our particular experiences into the generalized framework you’ll find in this book.
We owe our thanks to the wonderful people at Manning who made this book possible: Publisher Marjan Bace, Development Editor Ian Hough, and last but not least, Acquisitions Editor Andrew Waldron. Without their patience with our ever-evolving view on the data mesh, and their ability to make us synthesize it into a coherent view, we wouldn’t be able to finish Data Mesh in Action in a form we could so proudly present to you. We would like also to thank the marketing, editorial, and production teams, without whom this book would gather dust in a Manning drawer.
A heartfelt thanks also to Michael Jensen and Al Krinker for technical reviews, which allowed us to further condense and clarify data mesh concepts.
We would also like to thank all our reviewers, who trusted us and invested their time in reading this book, even when no one was sure it would make it to publication. To Alain Couniot, Arnaud Castelltort, Arnaud Estève, Jean-Georges Perrin, Juan Gabriel Guzmán Guerra, Mary Anne Thygesen, Massimo dr, Matthias Busch, Mike Fowler, Milan Sarenac, Nathan B. Crocker, Pradeep Bhattiprolu, Rahul Jain, Richard Vaughan, Salil Athalye, Sampath Chaparala, Shiroshica Kulatilake, Simon Tschöke, Stefano Ongarello, Sumih Damodaran, Suriyanto Bongso, and Yi Wei, your suggestions helped make this a better book.
about this book
This book serves two purposes. First, it organizes and presents knowledge about the new socio-technological paradigm of the data mesh. Second, it will help you implement a data mesh. From considering whether the data mesh is a suitable solution for your organization, to laying the groundwork, to developing a minimum viable product (MVP), to implementing data mesh principles, this book provides the tools needed to get you well on your way on your data mesh journey.
Who should read this book?
The most general description of our reader is someone who is involved in extracting value from data. However, because that describes almost everyone in our modern economy, we’ll outline the benefits this book will bring to various audiences.
The first group is people involved in creating, managing, and utilizing data within companies that have the following:
High socio-technological complexity (e.g., big corporations)
Complex data use cases
Many and diverse data sources
This encompasses, but is not limited to, roles including data architects, data engineers, software architects, tech leads, and senior developers.
The more you feel like these quantifiers apply to your business, the more likely it is that a data mesh could be a good solution. This book will help you understand data mesh concepts, including whose cooperation you need to secure, and what steps to take in both your organization and technical environment to move from a data mess to data mesh.
Beyond that, as the data mesh is a company-wide transformation process, the book’s content will be directly useful to executive-level personnel, including the technical C-suite, engineering directors and managers, enterprise architects, chief and lead architects, and solution/program owners. This book will help you decide to what extent and level of priority you should shift your company’s data environment into a data mesh direction, and help you plan the change management.
How this book is organized: A road map
While the book is meant to be read linearly, it is broken into three main parts and allows you to skip sections. The first part is a quick and hands-on introduction, the second explains the four principles of the data mesh in detail, and the third tackles the technical side of things in detail as well as the complete enterprise journey.
Part 1: Foundations
The goal of the first part of the book is to familiarize you with the data mesh paradigm as quickly as possible. To do so, we first go through the basics of the data mesh and then get our hands dirty by building our first data mesh within a month.
Chapter 1: The what and why of the data mesh
This chapter gives the overview needed to put the rest of the book into the proper context, including why you might want to consider following the data mesh mindset shift as well as a short explanation of the four key principles detailed in part 2.
Chapter 2: Is a data mesh right for you?
This chapter provides you with the context of the data mesh implementation and the drivers to consider when deciding on the transformation. It helps you decide whether you want to start the journey now and to identify your place on the data maturity scale. This helps you to match your data mesh journey to your particular situation.
Chapter 3: Kickstart your data mesh MVP in a month
This chapter is a hands-on example of how to go about building an MVP. The Messflix MVP focuses a lot on the organizational challenges and stays light on the technology side of things, which an MVP should. The technology details will be picked up later. The chapter provides you with tools like stakeholder mappings and FAIR principles (findable, accessible, interoperable, reusable) to get you started.
Part 2: The four principles in practice
The goal of the second part of the book is to provide you with the tools to tackle the four principles of the data mesh so you can advance your data mesh beyond the first month.
Chapter 4: Domain ownership
This chapter is all about domains and business capabilities and how you can identify suitable owners for data inside a company. It provides you with a lot of workshop techniques, including domain storytelling.
Chapter 5: Domain data as a product
Data is often treated as a by-product. This chapter is about changing to a product perspective called data as a product. The chapter provides examples of data products from Messflix and explains in detail concepts like the data product canvas and data ports.
Chapter 6: Federated computational governance
This chapter tackles data governance in the data mesh context. Inside data meshes, this is called federated computational governance, because of the balance of central and distributed governance aspects as well as an automated execution needed to unfold the data mesh. This chapter contains a discussion of centralized versus decentralized aspects, hands-on examples from Messflix, and a guide for setting up a governance team.
Chapter 7: The self-serve data platform
The last chapter on data mesh principles covers the platform, the enabling technology that makes the data mesh work. The chapter works through three iterations on our data platform for Messflix and explains important concepts like platform thinking along with these examples.
Part 3: Infrastructure and technical architecture
The third part focuses on all things technical. We break out of the Messflix example to highlight various architectures and discuss multiple options for moving from your existing structure to a data mesh.
Chapter 8: Comparing self-serve data platforms
This chapter explains blueprints for data mesh platforms that fit various cloud providers as well as different sizes of companies.
Chapter 9: Solution architecture design
In this chapter, we focus on the migration from your existing system to various kinds of architectures step by step and component by component. We talk about data lakes, data warehouses, REST APIs, and more.
How to use this book
We don’t want to present just another theory of the data mesh. This book is more of a structured, collective diary of actions leading to data mesh development in various environments. The emphasis is on actions leading to. We arrived at the data mesh after a long and often painful journey through multiple other solutions. Over the years, we’ve been testing, researching, discussing, and, last but not least, failing a lot in the process. In this book, we share with you the summary of I wish someone had told me earlier
insights. We hope you will be able to immediately put the information you’ll get out of it, well, in action.
Depending on your goal, there are a few focal points you could set while reading this book to dive deeper into. If your interest is purely informational, and your goal is to be able to explain the concepts to your team, your management, or your company, we recommend you put a lot of focus on chapters 1 and 2, which provide a quick overview, as well as the MVP presented in chapter 4. In addition, by reading through chapter 9 for a deeper dive into the reasons for this paradigm shift and a lighter look into part 2, you will be well equipped to explain the data mesh paradigm to someone else.
If you want to launch a larger initiative inside your company, you’ll need to be convincing. In that case, we recommend you take a deep dive into the entirety of chapter 9 and pay close attention to chapter 3, which offers insight into the question of whether you should start this journey at all. Chapter 4, presenting the full-scale data mesh MVP development, and chapter 2, offering a quick glance into a lightweight application of data mesh principles, will allow you to balance the big-picture view with notes on requirements of quick implementation and getting results fast. All together, this material should equip you with enough convincing material to get top-level buy-in.
If you’re interested in the technical side of things, like automated governance and the self-serve platform, chapters 5 to 8 will provide you with a lot of interesting content to dig through.
If you work inside a development team, we particularly recommend that you turn your attention to chapter 4. This chapter explains exactly what is broken in the current mode of thinking and should also help you advance your ways of working without ever touching the data mesh concept. Additionally, we recommend chapter 8, as it explains possible architecture alternatives for serving data from a development team’s point of view.
If you want to advance the way you work inside your data team, you could focus on chapters 3 and 4 to deeply understand the source of your current troubles. You could also focus on chapter 6 to understand what platform thinking in a data context means. Both could help you advance your ways of working without actually adopting a full data mesh approach inside the company.
We’re sure there are many more reasons for you to open up this book; these are simply a few possible ways you could go about putting this book into use.
The Messflix case study
To help you conceptualize the practical aspects of putting a data mesh in action, we combined our experiences and merged them into a single data mesh journey of Messflix LLC.
Messflix, a movie- and TV-show streaming platform, just hit a wall. A data wall. The company has all the data in the world but complains about not even being able to build a proper recommendation system for its movies and shows. The competition seems to be able to get it done; in fact, the competition is famous for being the first movers in a lot of technology sectors.
Other companies in equally complex industries seem to be able to put their data to work. Messflix does work with data, and analysts are able to get some insights from it, but the organization’s leaders don’t feel like they can call themselves data driven.
The data science trial runs seem to all end in pretty prototypes
with no clear business value. The data scientists tell their managers that it’s because the product team just doesn’t want to put these great prototypes on the roadmap,
or, in another instance, because the data from the source is way too messy and inconsistent.
In short, Messflix hopefully sounds like your average business, which for some reason doesn’t feel like it’s able to let the right data flow to the right use cases. The data landscape, just like the technology landscape, has grown organically over time and has become quite complex.
The two key technology components of Messflix are its Messflix Streaming Platform and Hitchcock Movie Maker. The streaming platform does just what it says: enable subscribers to watch shows and movies. The movie maker is a set of tools helping the movie production teams choose good movie topics, themes, and content.
Additionally, Messflix has a data lake with an analytics platform on top of it taking data from everywhere. A few teams manage these components. The teams Orange and White together operate a few of the Hitchcock Movie Maker tools. Team Green is all about the subscriptions, the log-in processes, etc., and team Yellow is responsible for getting things on the screen inside the streaming platform. Figure 1 depicts a rough architecture sketch of a few of these components before we briefly discuss how data is currently handled at Messflix.
The main Messflix software components. The data team handles a large variety of data sources and responsibilities.
The Data team gets data into the data warehouse from a few different places—for example, cost statements from the Hitchcock Movie Maker and subscriptions from the subscriptions service. The team also gets streaming data and subscription profiles from the data lake.
Then the Data team does some number crunching to transform this data into information for fraud analysis and business decisions.
Finally, this information is used by decentralized units to make those business decisions and for other use cases. This currently is a centralized workflow. The data team sits in the middle.
No matter where you’re coming from and where you want to go, you will find yourself somewhere along the Messflix journey. So let’s take one final look at the complete journey Messflix is going through.
No data journey is a simple straight line. Likewise, we don’t pretend that the Messflix journey is a simple linear progression of a series of steps. You’ll see different approaches in the chapters and ways to make the data mesh fit your company, even though the Messflix example illustrates one main thread to guide you.
You can follow that main thread used by Messflix throughout chapters 2 through 6 and chapter 9. Table 1 gives you an overview of the stages of the company, as we highlight two dimensions alongside the journey to a data mesh. The first is the number of organizational units and teams affected. The second is the types of company responsibilities that are decentralized.
The core of the data mesh paradigm shift is the decentralization of the responsibility for data. But responsibility for data today is practically split into multiple parts, all of which need to be decentralized. Thus we highlight all four kinds of responsibility for data in table 1; each corresponds to one of the principles presented in part 2.
Table 1 The Messflix journey