Ebook1,090 pages12 hours

Star Schema The Complete Reference

By Christopher Adamson

Rating: 0 out of 5 stars

()

Read preview

About this ebook

The definitive guide to dimensional design for your data warehouse

Learn the best practices of dimensional design. Star Schema: The Complete Reference offers in-depth coverage of design principles and their underlying rationales. Organized around design concepts and illustrated with detailed examples, this is a step-by-step guidebook for beginners and a comprehensive resource for experts.

This all-inclusive volume begins with dimensional design fundamentals and shows how they fit into diverse data warehouse architectures, including those of W.H. Inmon and Ralph Kimball. The book progresses through a series of advanced techniques that help you address real-world complexity, maximize performance, and adapt to the requirements of BI and ETL software products. You are furnished with design tasks and deliverables that can be incorporated into any project, regardless of architecture or methodology.

Master the fundamentals of star schema design and slow change processing
Identify situations that call for multiple stars or cubes
Ensure compatibility across subject areas as your data warehouse grows
Accommodate repeating attributes, recursive hierarchies, and poor data quality
Support conflicting requirements for historic data
Handle variation within a business process and correlation of disparate activities
Boost performance using derived schemas and aggregates
Learn when it's appropriate to adjust designs for BI and ETL tools

Skip carousel

LanguageEnglish

PublisherMcGraw-Hill Education

Release dateJul 22, 2010

ISBN9780071744331

Author

Christopher Adamson

Related authors

Skip carousel

Related to Star Schema The Complete Reference

Related ebooks

Skip carousel

Modelling Business Information: Entity relationship and class modelling for Business Analysts
Ebook
Modelling Business Information: Entity relationship and class modelling for Business Analysts
byKeith Gordon
Rating: 0 out of 5 stars
0 ratings
The Data Model Resource Book: Volume 3: Universal Patterns for Data Modeling
Ebook
The Data Model Resource Book: Volume 3: Universal Patterns for Data Modeling
byLen Silverston
Rating: 0 out of 5 stars
0 ratings
Database Management for Business Leaders: Building and Using Data Solutions That Work for You
Ebook
Database Management for Business Leaders: Building and Using Data Solutions That Work for You
byLarry Ruddell
Rating: 0 out of 5 stars
0 ratings
Data Analytics with Google Cloud Platform: Build Real Time Data Analytics on Google Cloud Platform
Ebook
Data Analytics with Google Cloud Platform: Build Real Time Data Analytics on Google Cloud Platform
byMurari Ramuka
Rating: 0 out of 5 stars
0 ratings
Data warehouse Complete Self-Assessment Guide
Ebook
Data warehouse Complete Self-Assessment Guide
byGerardus Blokdyk
Rating: 4 out of 5 stars
4/5
The Data Governance Imperative
Ebook
The Data Governance Imperative
bySteve Sarsfield
Rating: 0 out of 5 stars
0 ratings
Data Governance A Complete Guide - 2021 Edition
Ebook
Data Governance A Complete Guide - 2021 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Tabular Modeling with SQL Server 2016 Analysis Services Cookbook
Ebook
Tabular Modeling with SQL Server 2016 Analysis Services Cookbook
byDerek Wilson
Rating: 4 out of 5 stars
4/5
Expert Cube Development with SSAS Multidimensional Models
Ebook
Expert Cube Development with SSAS Multidimensional Models
byMarco Russo
Rating: 0 out of 5 stars
0 ratings
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
Ebook
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
byAJIT DASH
Rating: 3 out of 5 stars
3/5
Mastering Databricks Lakehouse Platform: Perform Data Warehousing, Data Engineering, Machine Learning, DevOps, and BI into a Single Platform (English Edition)
Ebook
Mastering Databricks Lakehouse Platform: Perform Data Warehousing, Data Engineering, Machine Learning, DevOps, and BI into a Single Platform (English Edition)
bySagar Lad
Rating: 0 out of 5 stars
0 ratings
Microsoft Tabular Modeling Cookbook
Ebook
Microsoft Tabular Modeling Cookbook
byPaul te Braak
Rating: 0 out of 5 stars
0 ratings
Data Strategy: From definition to execution
Ebook
Data Strategy: From definition to execution
byIan Wallis
Rating: 0 out of 5 stars
0 ratings
The Absolute Guide to Dashboarding and Reporting with Power BI: How to Design and Create a Financial Dashboard with Power BI – End to End
Ebook
The Absolute Guide to Dashboarding and Reporting with Power BI: How to Design and Create a Financial Dashboard with Power BI – End to End
byKasper de Jonge
Rating: 0 out of 5 stars
0 ratings
Managing Data Quality: A practical guide
Ebook
Managing Data Quality: A practical guide
byTim King
Rating: 0 out of 5 stars
0 ratings
Implementing Power BI in the Enterprise
Ebook
Implementing Power BI in the Enterprise
byGreg Low
Rating: 5 out of 5 stars
5/5
The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling
Ebook
The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling
byRalph Kimball
Rating: 0 out of 5 stars
0 ratings
Tableau 10 Business Intelligence Cookbook
Ebook
Tableau 10 Business Intelligence Cookbook
bySantos Donabel
Rating: 0 out of 5 stars
0 ratings
Azure Storage, Streaming, and Batch Analytics: A guide for data engineers
Ebook
Azure Storage, Streaming, and Batch Analytics: A guide for data engineers
byRichard Nuckolls
Rating: 0 out of 5 stars
0 ratings
Data Engineering on Azure
Ebook
Data Engineering on Azure
byVlad Riscutia
Rating: 0 out of 5 stars
0 ratings
DAX Patterns: Second Edition
Ebook
DAX Patterns: Second Edition
byMarco Russo
Rating: 5 out of 5 stars
5/5
Tableau Desktop Certified Associate: Exam Guide: Develop your Tableau skills and prepare for Tableau certification with tips from industry experts
Ebook
Tableau Desktop Certified Associate: Exam Guide: Develop your Tableau skills and prepare for Tableau certification with tips from industry experts
byDmitry Anoshin
Rating: 0 out of 5 stars
0 ratings
The Data Model Resource Book, Volume 1: A Library of Universal Data Models for All Enterprises
Ebook
The Data Model Resource Book, Volume 1: A Library of Universal Data Models for All Enterprises
byLen Silverston
Rating: 0 out of 5 stars
0 ratings
The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data
Ebook
The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data
byRalph Kimball
Rating: 4 out of 5 stars
4/5
Data Processing and Modeling with Hadoop: Mastering Hadoop Ecosystem Including ETL, Data Vault, DMBok, GDPR, and Various Data-Centric Tools
Ebook
Data Processing and Modeling with Hadoop: Mastering Hadoop Ecosystem Including ETL, Data Vault, DMBok, GDPR, and Various Data-Centric Tools
byVinicius Aquino do Vale
Rating: 0 out of 5 stars
0 ratings
The Applied SQL Data Analytics Workshop - Second Edition: Develop your practical skills and prepare to become a professional data analyst, 2nd Edition
Ebook
The Applied SQL Data Analytics Workshop - Second Edition: Develop your practical skills and prepare to become a professional data analyst, 2nd Edition
byMatt Goldwasser
Rating: 0 out of 5 stars
0 ratings
Learn Data Warehousing in 24 Hours
Ebook
Learn Data Warehousing in 24 Hours
byAlex Nordeen
Rating: 0 out of 5 stars
0 ratings
Practical Data Analysis
Ebook
Practical Data Analysis
byHector Cuesta
Rating: 4 out of 5 stars
4/5
Microsoft Power BI The Ultimate Step-By-Step Guide
Ebook
Microsoft Power BI The Ultimate Step-By-Step Guide
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Mastering Data Analysis with R
Ebook
Mastering Data Analysis with R
byDaróczi Gergely
Rating: 5 out of 5 stars
5/5

Databases For You

Skip carousel

Oracle DBA Mentor: Succeeding as an Oracle Database Administrator
Ebook
Oracle DBA Mentor: Succeeding as an Oracle Database Administrator
byBrian Peasland
Rating: 0 out of 5 stars
0 ratings
Blockchain Basics: A Non-Technical Introduction in 25 Steps
Ebook
Blockchain Basics: A Non-Technical Introduction in 25 Steps
byDaniel Drescher
Rating: 5 out of 5 stars
5/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Beginning Microsoft Power BI: A Practical Guide to Self-Service Data Analytics
Ebook
Beginning Microsoft Power BI: A Practical Guide to Self-Service Data Analytics
byDan Clark
Rating: 0 out of 5 stars
0 ratings
Access 2019 For Dummies
Ebook
Access 2019 For Dummies
byLaurie A. Ulrich
Rating: 0 out of 5 stars
0 ratings
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
Ebook
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
byAlexander Cooper
Rating: 2 out of 5 stars
2/5
COMPUTER SCIENCE FOR ROOKIES
Ebook
COMPUTER SCIENCE FOR ROOKIES
byAngel Bahabwa
Rating: 0 out of 5 stars
0 ratings
Practical Data Analysis
Ebook
Practical Data Analysis
byHector Cuesta
Rating: 4 out of 5 stars
4/5
CompTIA DataSys+ Study Guide: Exam DS0-001
Ebook
CompTIA DataSys+ Study Guide: Exam DS0-001
byMike Chapple
Rating: 0 out of 5 stars
0 ratings
Behind Every Good Decision: How Anyone Can Use Business Analytics to Turn Data into Profitable Insight
Ebook
Behind Every Good Decision: How Anyone Can Use Business Analytics to Turn Data into Profitable Insight
byPiyanka Jain
Rating: 5 out of 5 stars
5/5
Python Projects for Everyone
Ebook
Python Projects for Everyone
byMohamad Charara
Rating: 0 out of 5 stars
0 ratings
Learn SQL in 24 Hours
Ebook
Learn SQL in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
Access 2016 For Dummies
Ebook
Access 2016 For Dummies
byLaurie A. Ulrich
Rating: 0 out of 5 stars
0 ratings
Go in Action
Ebook
Go in Action
byErik St. Martin
Rating: 5 out of 5 stars
5/5
The Analytic Detective: Decipher Your Company’s Data Clues and Become Irreplaceable
Ebook
The Analytic Detective: Decipher Your Company’s Data Clues and Become Irreplaceable
bySteve Leeds
Rating: 0 out of 5 stars
0 ratings
Access for Beginners: Access Essentials, #1
Ebook
Access for Beginners: Access Essentials, #1
byM.L. Humphrey
Rating: 0 out of 5 stars
0 ratings
Learn SQL Server Administration in a Month of Lunches
Ebook
Learn SQL Server Administration in a Month of Lunches
byDon Jones
Rating: 3 out of 5 stars
3/5
SQL Programming & Database Management For Absolute Beginners SQL Server, Structured Query Language Fundamentals: "Learn - By Doing" Approach And Master SQL
Ebook
SQL Programming & Database Management For Absolute Beginners SQL Server, Structured Query Language Fundamentals: "Learn - By Doing" Approach And Master SQL
byWilliam Sullivan
Rating: 5 out of 5 stars
5/5
Learning Oracle 12c: A PL/SQL Approach
Ebook
Learning Oracle 12c: A PL/SQL Approach
bySham Tickoo
Rating: 0 out of 5 stars
0 ratings
Access 2010 All-in-One For Dummies
Ebook
Access 2010 All-in-One For Dummies
byAlison Barrows
Rating: 4 out of 5 stars
4/5
Learn Git in a Month of Lunches
Ebook
Learn Git in a Month of Lunches
byRick Umali
Rating: 0 out of 5 stars
0 ratings
Azure SQL Revealed: A Guide to the Cloud for SQL Server Professionals
Ebook
Azure SQL Revealed: A Guide to the Cloud for SQL Server Professionals
byBob Ward
Rating: 0 out of 5 stars
0 ratings
A Concise Guide to Object Orientated Programming
Ebook
A Concise Guide to Object Orientated Programming
byalasdair gilchrist
Rating: 0 out of 5 stars
0 ratings
Getting Started with SQL Server 2014 Administration
Ebook
Getting Started with SQL Server 2014 Administration
byGethyn Ellis
Rating: 0 out of 5 stars
0 ratings
Python and SQLite Development
Ebook
Python and SQLite Development
byAgus Kurniawan
Rating: 0 out of 5 stars
0 ratings
Practical SQL
Ebook
Practical SQL
byDavid Perry
Rating: 4 out of 5 stars
4/5
LINUX: Beginner's Crash Course. Your Step-By-Step Guide To Learning The Linux Operating System And Command Line Easy & Fast!
Ebook
LINUX: Beginner's Crash Course. Your Step-By-Step Guide To Learning The Linux Operating System And Command Line Easy & Fast!
byJeremy Li
Rating: 3 out of 5 stars
3/5
SQL in 30 Pages
Ebook
SQL in 30 Pages
byU.Q. Magnusson
Rating: 4 out of 5 stars
4/5
Learning PostgreSQL
Ebook
Learning PostgreSQL
byJuba Salahaldin
Rating: 1 out of 5 stars
1/5

Related podcast episodes

Skip carousel

007: Data Cleansing & Analysis with Oz du Soleil: Oz du Soleil is an Excel MVP since 2015 and is an expert in data cleansing & analysis. He has an Excel blog over at www.datascopic.net which is his commitment to data literacy. He’s the leading author on the revised version of Guerrilla Data...
Podcast episode
007: Data Cleansing & Analysis with Oz du Soleil: Oz du Soleil is an Excel MVP since 2015 and is an expert in data cleansing & analysis. He has an Excel blog over at www.datascopic.net which is his commitment to data literacy. He’s the leading author on the revised version of Guerrilla Data...
byLearn Microsoft Excel with MyExcelOnline
0 ratings
0% found this document useful
SnowflakeDB: The Data Warehouse Built For The Cloud - Episode 110: An interview about how SnowflakeDB was built to provide a performant and flexible data platform for the cloud era
Podcast episode
SnowflakeDB: The Data Warehouse Built For The Cloud - Episode 110: An interview about how SnowflakeDB was built to provide a performant and flexible data platform for the cloud era
byData Engineering Podcast
0 ratings
0% found this document useful
015: Excel Power BI With Mynda Treacy: Mynda Treacy is an Excel MVP since 2014 who resides in sunny Queensland, Australia and is an expert in Power BI and Excel Dashboards. In 2009 she started an online training website called MyOnlineTrainingHub.com, where they provide free and paid...
Podcast episode
015: Excel Power BI With Mynda Treacy: Mynda Treacy is an Excel MVP since 2014 who resides in sunny Queensland, Australia and is an expert in Power BI and Excel Dashboards. In 2009 she started an online training website called MyOnlineTrainingHub.com, where they provide free and paid...
byLearn Microsoft Excel with MyExcelOnline
0 ratings
0% found this document useful
Building An Enterprise Data Fabric At CluedIn - Episode 74: An interview about building an enterprise data fabric at scale to ease enterprise data integration
Podcast episode
Building An Enterprise Data Fabric At CluedIn - Episode 74: An interview about building an enterprise data fabric at scale to ease enterprise data integration
byData Engineering Podcast
0 ratings
0% found this document useful
Introduction to Data Mesh
Podcast episode
Introduction to Data Mesh
byThe Cloudcast
0 ratings
0% found this document useful
#69 Effective Data Storytelling: How to Turn Insights into Action
Podcast episode
#69 Effective Data Storytelling: How to Turn Insights into Action
byDataFramed
0 ratings
0% found this document useful
SaaStr 180: David Skok on Why You Should Not Focus On CAC/LTV In The Early Days, What Is The Right Way To Analyse Sales Rep Productivity & The Leading Indicators Early Stage VCs Use to Assess Product Market Fit: David Skok is a serial entrepreneur turned VC at Matrix Partners. He founded four companies: Skok Systems, Corporate Software Europe, Watermark Software, and SilverStream Software and did one turnaround with Xionics. Three of the companies he founded...
Podcast episode
SaaStr 180: David Skok on Why You Should Not Focus On CAC/LTV In The Early Days, What Is The Right Way To Analyse Sales Rep Productivity & The Leading Indicators Early Stage VCs Use to Assess Product Market Fit: David Skok is a serial entrepreneur turned VC at Matrix Partners. He founded four companies: Skok Systems, Corporate Software Europe, Watermark Software, and SilverStream Software and did one turnaround with Xionics. Three of the companies he founded...
byThe Official SaaStr Podcast: SaaS | Founders | Investors
0 ratings
0% found this document useful
The Rise of Modern Data Management // Chad Sanderson // #226
Podcast episode
The Rise of Modern Data Management // Chad Sanderson // #226
byMLOps.community
0 ratings
0% found this document useful
Organizing for Successful Data Science at Stitch Fix with Eric Colson - TWiML Talk #257: For the final episode of our Strata Data series, we’re joined by Eric Colson, Chief Algorithms Officer at Stitch Fix, whose presentation at the conference explored “How to make fewer bad decisions.” Our discussion focuses in on the three key...
Podcast episode
Organizing for Successful Data Science at Stitch Fix with Eric Colson - TWiML Talk #257: For the final episode of our Strata Data series, we’re joined by Eric Colson, Chief Algorithms Officer at Stitch Fix, whose presentation at the conference explored “How to make fewer bad decisions.” Our discussion focuses in on the three key...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Eric Olden: Managing Identity, Security, and Resiliency in Multi-Cloud Environments
Podcast episode
Eric Olden: Managing Identity, Security, and Resiliency in Multi-Cloud Environments
byThe PrOTect OT Cybersecurity Podcast
0 ratings
0% found this document useful
How Data Lakes Support ML in Industry - with Cloudera's Amr Awadallah: If you're going to apply machine learning (ML) in a business context, you need a lot of data, and algorithms across the board perform better with more recent, rich, and relevant data. Today, there are companies whose entire business models are...
Podcast episode
How Data Lakes Support ML in Industry - with Cloudera's Amr Awadallah: If you're going to apply machine learning (ML) in a business context, you need a lot of data, and algorithms across the board perform better with more recent, rich, and relevant data. Today, there are companies whose entire business models are...
byThe AI in Business Podcast
0 ratings
0% found this document useful
Data Strategy and Customer Experience (with Google and Albertsons): #DataScience #CustomerExperience Data is central to how companies compete, nurture customer relationships, and develop brand loyalty through end-to-end customer experience. In this environment, data strategy is crucial to business success. But,...
Podcast episode
Data Strategy and Customer Experience (with Google and Albertsons): #DataScience #CustomerExperience Data is central to how companies compete, nurture customer relationships, and develop brand loyalty through end-to-end customer experience. In this environment, data strategy is crucial to business success. But,...
byCXOTalk
0 ratings
0% found this document useful
Overcoming Cognitive Bias: How to recognize and avoid the many belief, behavioral, and social biases, which flaw much human decision making.
Podcast episode
Overcoming Cognitive Bias: How to recognize and avoid the many belief, behavioral, and social biases, which flaw much human decision making.
byRocketship.fm
0 ratings
0% found this document useful
Active Assist with Chris Law + SkySQL with Robert Hedgepeth: Max Saltonstall is back in the co-host seat, joining Mark for a fun chat about Active Assist.
Podcast episode
Active Assist with Chris Law + SkySQL with Robert Hedgepeth: Max Saltonstall is back in the co-host seat, joining Mark for a fun chat about Active Assist.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
66: A guide to data models and dynamic dashboards for marketers
Podcast episode
66: A guide to data models and dynamic dashboards for marketers
byHumans of Martech
0 ratings
0% found this document useful
Data Privacy Challenges and Where to Start with Skyflow’s Ari Hoffman: Ari Hoffman has spent his career helping businesses implement and solve technically challenging problems at companies like Cisco and Fivserv. Today, he serves as the Director of Customer Programs at Skyflow, where he helps Skyflow customers tackle th...
Podcast episode
Data Privacy Challenges and Where to Start with Skyflow’s Ari Hoffman: Ari Hoffman has spent his career helping businesses implement and solve technically challenging problems at companies like Cisco and Fivserv. Today, he serves as the Director of Customer Programs at Skyflow, where he helps Skyflow customers tackle th...
byPartially Redacted: Data, AI, Security, and Privacy
0 ratings
0% found this document useful
87: Michael Katz: The Evolution of packaged CDPs, democratizing ML and the myths of composable and zero data copy
Podcast episode
87: Michael Katz: The Evolution of packaged CDPs, democratizing ML and the myths of composable and zero data copy
byHumans of Martech
0 ratings
0% found this document useful
#27 New Report Reveals Trends and Challenges for the Industry: On this episode, Rekha Srivatsan, VP of Product Marketing at Salesforce Service Cloud, discusses the findings of the sixth edition of the Salesforce State of Service report. The report dives into some of the biggest trends in the customer success industry, including the increasing demand for personalized and fast service, challenges of implementing self-service while maintaining customer trust, and the use of AI and data and why it’s crucial in meeting customer expectations and improving customer experience.
Podcast episode
#27 New Report Reveals Trends and Challenges for the Industry: On this episode, Rekha Srivatsan, VP of Product Marketing at Salesforce Service Cloud, discusses the findings of the sixth edition of the Salesforce State of Service report. The report dives into some of the biggest trends in the customer success industry, including the increasing demand for personalized and fast service, challenges of implementing self-service while maintaining customer trust, and the use of AI and data and why it’s crucial in meeting customer expectations and improving customer experience.
byExperts of Experience
0 ratings
0% found this document useful
Tan Vachiramon - Choosing the right algorithm for your real-world problem
Podcast episode
Tan Vachiramon - Choosing the right algorithm for your real-world problem
byTowards Data Science
0 ratings
0% found this document useful
Integrating AI into Every Day Business with AI Expert Alex Castro CEO and founder of M Corp
Podcast episode
Integrating AI into Every Day Business with AI Expert Alex Castro CEO and founder of M Corp
byThe Scale Up Show
0 ratings
0% found this document useful
Vocally Self-critical leadership with Skyler Ramirez
Podcast episode
Vocally Self-critical leadership with Skyler Ramirez
byThink Like Amazon
0 ratings
0% found this document useful
Getting through a SOC 2 audit with your nerves intact: On this sponsored episode of the podcast, Ben and Ryan talk with James Ciesielski, CTO and co-founder, and Megan Dean, information security and risk compliance manager, both of Rewind. We talk about how you can prep for and successfully get through a SOC 2 audit, how backing up your SaaS data can provide business continuity, and the benefits of establishing a relationship with your auditor.
Podcast episode
Getting through a SOC 2 audit with your nerves intact: On this sponsored episode of the podcast, Ben and Ryan talk with James Ciesielski, CTO and co-founder, and Megan Dean, information security and risk compliance manager, both of Rewind. We talk about how you can prep for and successfully get through a SOC 2 audit, how backing up your SaaS data can provide business continuity, and the benefits of establishing a relationship with your auditor.
byThe Stack Overflow Podcast
0 ratings
0% found this document useful
MLOps.community #6 - Mid Scale Production Feature Engineering with Dr. Venkata Pingali
Podcast episode
MLOps.community #6 - Mid Scale Production Feature Engineering with Dr. Venkata Pingali
byMLOps.community
0 ratings
0% found this document useful
Evaluate Your Organization's Security with OCI: Join Lois Houston and Nikita Abraham, along with special guests Nancy Kramer and Betina Tagle from Oracle’s corporate security organization, as they discuss the steps you can take to evaluate your organization’s security, privacy, and compliance...
Podcast episode
Evaluate Your Organization's Security with OCI: Join Lois Houston and Nikita Abraham, along with special guests Nancy Kramer and Betina Tagle from Oracle’s corporate security organization, as they discuss the steps you can take to evaluate your organization’s security, privacy, and compliance...
byOracle University Podcast
0 ratings
0% found this document useful
Transforming Data-Driven Decisions into Success - with Allan Willie, Founder and CEO Klipfolio
Podcast episode
Transforming Data-Driven Decisions into Success - with Allan Willie, Founder and CEO Klipfolio
byMetrics that Measure Up
0 ratings
0% found this document useful
Data Governance with Jessi Ashdown and Uri Gilad: Hosts Stephanie Wong and Priyanka Vergadia learn about data governance this week in an interesting chat with Jessi Ashdown and Uri Gilad.
Podcast episode
Data Governance with Jessi Ashdown and Uri Gilad: Hosts Stephanie Wong and Priyanka Vergadia learn about data governance this week in an interesting chat with Jessi Ashdown and Uri Gilad.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
How Explainable AI is Critical to Building Responsible AI // Krishna Gade MLOps // Meetup #53
Podcast episode
How Explainable AI is Critical to Building Responsible AI // Krishna Gade MLOps // Meetup #53
byMLOps.community
0 ratings
0% found this document useful
First Command’s Darren Pedroza on Being a Data Activist and Leading your Company Through Digital Transformation: Darren Pedroza, VP Enterprise Data and Analytics at First Command Financial Services, discusses the new role of the CDO, agile data management best practices, and how to lead teams through the deep, sometimes scary, waters of digital transformation.
Podcast episode
First Command’s Darren Pedroza on Being a Data Activist and Leading your Company Through Digital Transformation: Darren Pedroza, VP Enterprise Data and Analytics at First Command Financial Services, discusses the new role of the CDO, agile data management best practices, and how to lead teams through the deep, sometimes scary, waters of digital transformation.
byThe Data Chief
0 ratings
0% found this document useful
Credit Karma’s Ryan Graciano on Data Marts, Data Models, and Disrupting the Credit Landscape: Ryan Graciano, CTO of Credit Karma, discusses the importance of data hygiene, explainable AI and why data literacy is something you have to continuously work at.
Podcast episode
Credit Karma’s Ryan Graciano on Data Marts, Data Models, and Disrupting the Credit Landscape: Ryan Graciano, CTO of Credit Karma, discusses the importance of data hygiene, explainable AI and why data literacy is something you have to continuously work at.
byThe Data Chief
0 ratings
0% found this document useful
MLOps Investments // Sarah Catanzaro // Coffee Session #33
Podcast episode
MLOps Investments // Sarah Catanzaro // Coffee Session #33
byMLOps.community
0 ratings
0% found this document useful

Skip carousel

Understanding ELT & ETL
Techfastly
Article
Understanding ELT & ETL
Apr 1, 2021
8 min read
What is ELT?
Techfastly
Article
What is ELT?
Apr 1, 2021
It stands for extract, load, and transform- the processes a data pipeline uses for replicating the data from a source system into a target system such as a cloud data warehouse. 1. Extraction is the first step in which data is copied from the source
6 min read
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Chicago Tribune
Article
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Jul 10, 2018
3 min read
Data Fabric
PC Pro Magazine
Article
Data Fabric
Aug 13, 2020
3 min read
What You Need to Know About Data Modeling
Entrepreneur
Article
What You Need to Know About Data Modeling
Jan 1, 2013
2 min read
Q&A
Rotman Management
Article
Q&A
May 1, 2023
Describe the capability that companies like Netflix, UPS, Amazon and Caesars Entertainment have in common. These are all leading firms in their industries with respect to leveraging analytics as a source of competitive advantage. We now have so much
7 min read
The Algorithmic Leader
Rotman Management
Article
The Algorithmic Leader
Jan 1, 2020
9 min read
Data-driven Decision Making That Uses Data, Mind And Heart
The European Business Review
Article
Data-driven Decision Making That Uses Data, Mind And Heart
Jan 31, 2020
14 min read
There’s A New Career In Town
True Love
Article
There’s A New Career In Town
Oct 21, 2019
2 min read
Tech Essentials for Startup Success
Inc.
Article
Tech Essentials for Startup Success
Aug 13, 2024
2 min read
Q&A
Rotman Management
Article
Q&A
Jan 1, 2022
You believe the time has come to bridge the ‘cultural gulf’ that exists within most organizations. Please explain. Historically, business leaders have had competencies in the physical domain — related to manufacturing and distributing products in an
7 min read
‘Blueprints’ Help Small Business Take Advantage Of The Cloud
Futurity
Article
‘Blueprints’ Help Small Business Take Advantage Of The Cloud
Sep 6, 2019
2 min read
Signals Of Change: how To Evolve For The New Global Reality
Rotman Management
Article
Signals Of Change: how To Evolve For The New Global Reality
May 1, 2022
11 min read
Why Wait For Perfect?
NZ Marketing
Article
Why Wait For Perfect?
Jun 9, 2021
6 min read
“The Biggest Problem I See When People Are Working From Home Is A Poorly Designed Network”
PC Pro Magazine
Article
“The Biggest Problem I See When People Are Working From Home Is A Poorly Designed Network”
Jun 8, 2023
6 min read
YOU’VE GOT DATA …now what?
NZ Marketing
Article
YOU’VE GOT DATA …now what?
Sep 23, 2019
10 min read
Salesforce Adding Einstein Analytics Al To Tableau Platform
Techfastly
Article
Salesforce Adding Einstein Analytics Al To Tableau Platform
Feb 4, 2021
3 min read
Transforming To Paper-lite In Your Business
Facility Management
Article
Transforming To Paper-lite In Your Business
Jun 27, 2019
4 min read
The Case for Updating the Hiring Process
Entrepreneur
Article
The Case for Updating the Hiring Process
Jun 1, 2014
3 min read
Ready for Whatever Is Next Thanks to a Cutting-Edge CRM Platform
Inc.
Article
Ready for Whatever Is Next Thanks to a Cutting-Edge CRM Platform
Aug 15, 2023
2 min read
Getting The edge
The European Business Review
Article
Getting The edge
Feb 25, 2021
7 min read
Pivoting To First-party Data
NZ Marketing
Article
Pivoting To First-party Data
Jun 9, 2021
5 min read
Strategic Drivers FOR THE POST-PANDEMIC ERA
The European Business Review
Article
Strategic Drivers FOR THE POST-PANDEMIC ERA
Feb 25, 2021
10 min read
The Strategy Palette: Five Approaches to Strategy for a Complex World
Rotman Management
Article
The Strategy Palette: Five Approaches to Strategy for a Complex World
Sep 1, 2018
10 min read
The Era of Human + Machine Innovation
Rotman Management
Article
The Era of Human + Machine Innovation
Jan 1, 2019
Interview by Karen Christensen In today's environment, organizations that don't keep up with customers' evolving needs are doomed. What is the best way to get a handle on these evolving needs? The first step in understanding your customers is to acce
5 min read
The Significance of Machine Learning
Techfastly
Article
The Significance of Machine Learning
Mar 1, 2022
3 min read
Looking For Cx Excellence? start With Your Data
NZ Marketing
Article
Looking For Cx Excellence? start With Your Data
Sep 22, 2021
One common thread drives great customer experiences: first-party data. The CX Framework is a valuable reference for best-practice customer experience. Developed by The CX Academy in Ireland (which in my view offers some of the best training for CX pr
2 min read
More Than The Numbers
Facility Management
Article
More Than The Numbers
Jun 24, 2018
5 min read
Expansion: You Reap What You Sow
Inc.
Article
Expansion: You Reap What You Sow
Oct 24, 2023
13 min read
3 Salesforce Buys Slack In A $27.4B Deal.
Techfastly
Article
3 Salesforce Buys Slack In A $27.4B Deal.
Jan 6, 2021
5 min read

Related categories

Skip carousel

Reviews for Star Schema The Complete Reference

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Star Schema The Complete Reference - Christopher Adamson

Star Schema

The Complete Reference™

About the Author

Christopher Adamson is an expert in star schema design and an experienced data warehousing practitioner. He speaks and teaches worldwide, writes about data warehousing, and provides data warehousing services through Oakton Software LLC.

As a consultant, Chris works with customers in all industries to develop data warehousing strategies, identify and prioritize projects, design and modify solutions, and manage implementations. His customers have included Fortune 500 companies, government agencies, and data warehousing software vendors.

Chris has taught dimensional design to thousands of students worldwide. He is a faculty member at The Data Warehousing Institute (TDWI), teaches at conference events, and provides onsite education.

Chris has penned many articles and books about data warehousing. He is author of Mastering Data Warehouse Aggregates (Wiley 2006) and co-author of Data Warehouse Design Solutions (Wiley 1998). He blogs about data warehousing and answers questions from readers at www.StarSchemaCentral.com.

Chris is the founder of Oakton Software LLC, and can be contacted through his web site at www.ChrisAdamson.net.

Star Schema

The Complete Reference™

Christopher Adamson

Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Except as permitted under the United States Copyright Act of 1976, no part of this publication may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without the prior written permission of the publisher.

ISBN: 978-0-07-174433-1

MHID: 0-07-174433-9

The material in this eBook also appears in the print version of this title: ISBN: 978-0-07-174432-4, MHID: 0-07-174432-0.

All trademarks are trademarks of their respective owners. Rather than put a trademark symbol after every occurrence of a trademarked name, we use names in an editorial fashion only, and to the benefit of the trademark owner, with no intention of infringement of the trademark. Where such designations appear in this book, they have been printed with initial caps.

McGraw-Hill eBooks are available at special quantity discounts to use as premiums and sales promotions, or for use in corporate training programs. To contact a representative please e-mail us at bulksales@mcgraw-hill.com.

Information has been obtained by McGraw-Hill from sources believed to be reliable. However, because of the possibility of human or mechanical error by our sources, McGraw-Hill, or others, McGraw-Hill does not guarantee the accuracy, adequacy, or completeness of any information and is not responsible for any errors or omissions or the results obtained from the use of such information.

TERMS OF USE

This is a copyrighted work and The McGraw-Hill Companies, Inc. (McGraw-Hill) and its licensors reserve all rights in and to the work. Use of this work is subject to these terms. Except as permitted under the Copyright Act of 1976 and the right to store and retrieve one copy of the work, you may not decompile, disassemble, reverse engineer, reproduce, modify, create derivative works based upon, transmit, distribute, disseminate, sell, publish or sublicense the work or any part of it without McGraw-Hill’s prior consent. You may use the work for your own noncommercial and personal use; any other use of the work is strictly prohibited. Your right to use the work may be terminated if you fail to comply with these terms.

THE WORK IS PROVIDED AS IS. McGRAW-HILL AND ITS LICENSORS MAKE NO GUARANTEES OR WARRANTIES AS TO THE ACCURACY, ADEQUACY OR COMPLETENESS OF OR RESULTS TO BE OBTAINED FROM USING THE WORK, INCLUDING ANY INFORMATION THAT CAN BE ACCESSED THROUGH THE WORK VIA HYPERLINK OR OTHERWISE, AND EXPRESSLY DISCLAIM ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. McGraw-Hill and its licensors do not warrant or guarantee that the functions contained in the work will meet your requirements or that its operation will be uninterrupted or error free. Neither McGraw-Hill nor its licensors shall be liable to you or anyone else for any inaccuracy, error or omission, regardless of cause, in the work or for any damages resulting there from. McGraw-Hill has no responsibility for the content of any information accessed through the work. Under no circumstances shall McGraw-Hill and/or its licensors be liable for any indirect, incidental, special, punitive, consequential or similar damages that result from the use of or inability to use the work, even if any of them has been advised of the possibility of such damages. This limitation of liability shall apply to any claim or cause whatsoever whether such claim or cause arises in contract, tort or otherwise.

To Gladys, Justin, and Carter. You are the source of unending happiness, you lift me up when I need it, and I will always be proud of you.

Contents at a Glance

Part I Fundamentals

1 Analytic Databases and Dimensional Design

2 Data Warehouse Architectures

3 Stars and Cubes

Part II Multiple Stars

4 A Fact Table for Each Process

5 Conformed Dimensions

Part III Dimension Design

6 More on Dimension Tables

7 Hierarchies and Snowflakes

8 More Slow Change Techniques

9 Multi-Valued Dimensions and Bridges

10 Recursive Hierarchies and Bridges

Part IV Fact Table Design

11 Transactions, Snapshots, and Accumulating Snapshots

12 Factless Fact Tables

13 Type-Specific Stars

Part V Performance

14 Derived Schemas

15 Aggregates

Part VI Tools and Documentation

16 Design and Business Intelligence

17 Design and ETL

18 How to Design and Document a Dimensional Model Index

Acknowledgments

Introduction

Part I Fundamentals

Chapter 1 Analytic Databases and Dimensional Design

Dimensional Design

Purpose

Measurement and Context

Facts and Dimensions

Grouping Dimensions and Facts

The Star Schema

Dimension Tables

Keys and History

Fact Tables

Using a Star Schema

Querying Facts

Browsing Dimensions

Guiding Principles

Summary

Further Reading

Chapter 2 Data Warehouse Architectures

Inmon’s Corporate Information Factory

Kimball’s Dimensional Data Warehouse

Stand-Alone Data Marts

Architecture and Dimensional Design

Contrasting the Approaches

The Common Element

Terms Used in This Book

Summary

Further Reading

Chapter 3 Stars and Cubes

Dimension Table Features

Surrogate Keys and Natural Keys

Rich Set of Dimensions

Grouping Dimensions into Dimension Tables

Fact Table Features

Fact Tables and Processes

Capturing Facts

Grain

Sparsity

Degenerate Dimensions

Slowly Changing Dimensions

Type 1 Change

Type 2 Change

Choosing and Implementing Response Types

Cubes

Summary

Further Reading

Part II Multiple Stars

Chapter 4 A Fact Table for Each Process

Fact Tables and Business Processes

Facts that Have Different Timing

A Single Fact Table Causes Difficulties

Modeling in Separate Fact Tables

Facts that Have Different Grain

A Single Fact Table Causes Difficulties

Modeling in Separate Fact Tables

Analyzing Facts from More than One Fact Table

The Peril of Joining Fact Tables

Drilling Across

Drill-Across Implementations

Summary

Further Reading

Chapter 5 Conformed Dimensions

The Synergy of Multiple Stars

Dimensions and Drilling Across

What Causes Failure?

Identical Tables Not Required

Conformed Dimensions

Types of Dimensional Conformance

Planning Conformance

Architecture and Conformance

Dimensional Data Warehouse

Corporate Information Factory

Stand-Alone Data Marts

Summary

Further Reading

Part III Dimension Design

Chapter 6 More on Dimension Tables

Grouping Dimensions into Tables

Two Ways of Relating Dimension Attributes

When Struggling with Dimension Groupings

Breaking Up Large Dimensions

Splitting Dimension Tables Arbitrarily

Alternatives to Split Dimensions

Mini-Dimensions Alleviate ETL Bottlenecks and Excessive Growth

Dimension Roles and Aliasing

Avoiding the NULL

Problems Caused by NULL

Avoiding NULL Foreign Key Values

Uses for Special-Case Rows

Behavioral Dimensions

Converting Facts to Dimensions at Query Time

Designing and Using Behavioral Dimensions

Design Considerations for Behavioral Dimensions

Summary

Further Reading

Chapter 7 Hierarchies and Snowflakes

Drilling

The Concept of Drilling

The Reality of Drilling

Attribute Hierarchies and Drilling

The Attribute Hierarchy

Drilling Within an Attribute Hierarchy

Other Ways to Drill

Documenting Attribute Hierarchies

Snowflakes

Avoiding the Snowflake

Embracing the Snowflake

Outriggers

Repeating Groups

Eliminating Repeating Groups with Outriggers

Outriggers and Slow Change Processing

Summary

Further Reading

Chapter 8 More Slow Change Techniques

Time-Stamped Dimensions

Point-in-Time Status of a Dimension

The Time-Stamped Solution

Type 3 Changes

Study All Facts with Old or New Dimension Values

The Type 3 Solution

Hybrid Slow Changes

Conflicting Requirements

The Hybrid Response

Evaluating and Extending the Hybrid Approach

Summary

Further Reading

Chapter 9 Multi-Valued Dimensions and Bridges

Standard One-to-Many Relationships

Multi-Valued Dimensions

Simplifying the Relationship

Using a Bridge for Multi-Valued Dimensions

Multi-Valued Attributes

Simplifying the Multi-Valued Attribute

Using an Attribute Bridge

Summary

Further Reading

Chapter 10 Recursive Hierarchies and Bridges

Recursive Hierarchies

Rows Referring to Other Rows

The Reporting Challenge

Flattening a Recursive Hierarchy

A Flattened Hierarchy

Drawbacks of Flattening

When Flattening Works Best

The Hierarchy Bridge

Hierarchy Bridge Design

Using the Bridge

Double-Counting

Resolving the Many-to-Many Relationship

Potential Misuse

Changes and the Hierarchy Bridge

Type 1 Changes in the Dimension or Bridge

Type 2 Changes to the Dimension

Type 2 Changes to the Hierarchy

Variations on the Hierarchy Bridge

Embellishing the Bridge

Multiple Parents

Multiple Hierarchies

Summary

Further Reading

Part IV Fact Table Design

Chapter 11 Transactions, Snapshots, and Accumulating Snapshots

Transaction Fact Tables

Describing Events

Properties of Transaction Fact Tables

Snapshot Fact Tables

The Challenge: Studying Status

The Snapshot Model

Snapshot Considerations

Accumulating Snapshot Fact Tables

Challenge: Studying Elapsed Time Between Events

The Accumulating Snapshot

Accumulating Snapshot Considerations

Summary

Further Reading

Chapter 12 Factless Fact Tables

Events with No Facts

Nothing to Measure?

The Factless Fact Table

Using a Factless Fact Table

Adding a Fact

Conditions, Coverage, or Eligibility

Why Model Conditions?

Factless Fact Tables for Conditions

Comparing Activities and Conditions

Slowly Changing Dimensions and Conditions

Summary

Further Reading

Chapter 13 Type-Specific Stars

Type-Specific Attributes

Operational Systems

Analytic Systems

Core and Custom Stars

Core and Custom Dimension Tables

Core and Custom Fact Tables

Other Considerations

Using Generic Attributes

Generic Attributes

Using a Generic Design

Summary

Further Reading

Part V Performance

Chapter 14 Derived Schemas

Restructuring Dimensional Data

Uses for Derived Schemas

Derived Schemas Already Covered

The Cost of Derived Schemas

The Merged Fact Table

Precomputed Drill-Across Results

Simplified Process Comparison

Improved Performance

Supporting Tools that Cannot Drill Across

Single-Process Analysis

Including a Nonshared Dimension

The Pivoted Fact Table

The Need to Pivot Data

The Pivoted Advantage

Drawbacks to Pivoting

The Sliced Fact Table

Creating Slices of a Star

Uses for Sliced Fact Tables

Slices First

Set Operation Fact Tables

Comparing Two Sets of Data

Several Possible Comparisons

Choosing to Precompute Set Operations

Summary

Further Reading

Chapter 15 Aggregates

Fundamentals of Aggregates

Summarizing Base Data

Using Aggregates

Loading Aggregates

Cubes as Aggregates

Making Aggregates Invisible

Aggregate Navigation

Aggregate Generation

Alternative Summary Designs

Transformative Summaries May Also Be Useful

Single Table Designs Should Be Avoided

Summary

Further Reading

Part VI Tools and Documentation

Chapter 16 Design and Business Intelligence

Business Intelligence and SQL Generation

SQL Generators

The Limitations of SQL Generators

Guidelines for the Semantic Layer

Features to Avoid

Features to Use

Working with SQL-Generating BI Tools

Multiple Stars

Semi-Additivity

Browse Queries

Bridge Tables

Working with Cube-Based BI

Cube-Centric Business Intelligence

Auto-Generation of Cubes

Summary

Further Reading

Chapter 17 Design and ETL

The ETL Process

A Complex Task

Tools Used by the ETL Process

Architecture and the ETL Process

Loading a Star

A Top-Level Dependency

Loading a Dimension Table

Loading the Fact Table

Optimizing the Load

Changed Data Identification

Simplifying Processing

Cleansing Data

What Should Be Cleaned Up

Cleaning Up Dimensional Data

Facts with Invalid Details

Housekeeping Columns

Housekeeping Columns in Dimension Tables

Housekeeping and Fact Tables

Summary

Further Reading

Chapter 18 How to Design and Document a Dimensional Model

Dimensional Design and the Data Warehouse Life Cycle

The Strategic Importance of Dimensional Design

When to Do Dimensional Design

Design Activities

Planning the Design Effort

Conducting Interviews

Designing the Dimensional Model

Prioritizing Plans

Documenting the Results

Documenting a Dimensional Model

Requirements Documentation

Top-Level Design Documentation

Detailed Design Documentation

Logical vs. Physical

Summary

Further Reading

Index

Acknowledgments

This book would not have been possible without the help of many people. That list starts with people like you—people who have read my books, attended my classes, e-mailed me questions, and visited my blog. Over the years, your comments and questions have helped me shape this reference to dimensional design. I hope you will find it relevant, practical, and useful. Thanks for your feedback, and keep it coming.

This book was also made possible by my customers, past and present. Thank you for always providing new challenges. Meeting them together never fails to be a fulfilling experience.

Lisa McClain of McGraw-Hill made this book a reality. My thanks to her for championing the project, and to everyone who has worked on this book’s development, production, marketing, and distribution.

Several people have given me opportunities over the years, without which you would not be reading these words today. Thanks to Cameron Hendershot and Marilyn Feldman, Joe Warren, Argee Mahecha, Matt Comstock, Mike Venerable, and David Wells. Many colleagues have also influenced my trajectory, some unknowingly so. Thanks to Jesse Baldwin, Jim Hadley, Mike Lynott (who taught me about unboiling frogs), Greg Jones, and Randall Porter. I am also grateful to Chris Date and Laura Reeves, who have given me useful advice.

Special thanks to Ralph Kimball, whose generous advice, support, and encouragement have been greatly valued, and whose contributions to this field are of immeasurable value to us all.

I love teaching and want to thank everyone at The Data Warehousing Institute who makes it possible. Wayne Eckerson, Paul Kautza, Yvonne Baho, Sandra Prescott, Brenda Woodbridge, Jennifer Noland, Nancy Hanlon, and Heather Flynn are a few of the people at TDWI to whom I am indebted. My gratitude also goes out to all others not named here who organize TDWI events with professionalism and attention to detail.

To my friends and family: your friendship and love are what is most valuable to me. I cannot name all of you here, but you know who you are. Special thanks and love to Mom, Sis, Jason, Aya, Papa, Magal, Shell, Eli, and Sofie. My love also to Dad who, though no longer here, continues to be an important part of who I am.

Introduction

Dimensional design is a pillar of every modern-day data warehouse architecture. Based on a disarmingly simple approach to process measurement, dimensional design enables extraordinarily powerful analytics. The products of dimensional design—the star schema, the snowflake, and the cube—can be found in virtually every data warehouse implementation.

Despite this popularity, relatively little is written about dimensional design. Although some outstanding works are available, most assume a particular architecture or philosophy—my own prior work included. Additionally, these treatments are organized around vertical industries or major business functions, making them difficult to refer to when faced with a specific design challenge.

This book is a complete reference to dimensional design—the first intended for any reader. The best practices presented in this volume cut across all architectures, including those espoused by W.H. Inmon and Ralph Kimball. Organized around the key concepts of dimensional design, this book provides full, in-depth treatment of each topic, sequenced in a logical progression from fundamentals through advanced techniques.

This book is designed for both beginners and experts in dimensional design. If you are a beginner, it is the ideal place to start. Each chapter provides you with best practices and their underlying rationale, detailed examples, and the criteria you need to make design decisions. If you are an expert, you will be able to use this guide as a reference. Whenever you face a particular design challenge, you will find a chapter or section dedicated to the topic.

Dimensional design enables profoundly powerful business analysis. A solid understanding of the underlying principles is essential, whether you are directly involved in design activities, work with dimensional data structures, manage projects, or fund implementations. Mastery of the techniques and best practices in this book will help you unleash the full potential of your data warehouse, regardless of architecture, implementation scope, or software tools.

About This Book

This book has been designed as a complete, in-depth reference for anyone who works with dimensional data—the star, the snowflake, or the cube.

• The content is organized into chapters and sections dedicated to the core concepts of dimensional design so you can find everything you need to know about a particular topic in one place.

• Each topic is treated comprehensively. Full explanations for best practices allow you to make informed design decisions based on operational realities.

• No assumptions are made about your data warehouse environment. The best practices here apply in all architectures, including those espoused by W.H. Inmon and Ralph Kimball.

• Specific software products are not referenced, but the ways in which your tools may influence design decisions are fully explored.

The result is a treatment that is comprehensive and useful, regardless of your level of experience, data warehouse architecture, or available tools.

Organized Around Core Concepts

This book is organized around the core concepts of dimensional modeling, rather than a series of business scenarios by vertical industry. Focusing on these concepts allows a complete treatment of each topic, without forcing you to flip back and forth between various business cases. Each topic is explored in depth, rather than spread across multiple chapters.

This comprehensive treatment of each concept allows Star Schema: The Complete Reference to serve as a useful resource. Experienced modelers will find what they need with a quick scan through the Table of Contents. Need to brush up on the implications of a snapshot design? Everything you need can be found in Chapter 11. Thinking about implementing a bridge table? It’s all there in Chapter 9. Need to implement a hybrid slow change? A complete discussion can be found in Chapter 8. Each chapter concludes with references to external treatments of the topic, should you wish to search for more examples.

For those new to dimensional design, the material has been sequenced so the book can be read cover to cover. The first three chapters explore fundamentals, and subsequent sections delve deeply into various aspects of dimensional design. Help on choosing where to start is provided at the end of this introduction.

Comprehensive and Practical, Not Dogmatic

While this book highlights a series of best practices, the underlying motivation is always fully explored. You will learn the reasons for these guidelines, and develop the ability to make informed decisions on how to apply them. The result is a practical approach to data warehouse design—one that is responsive to organizational and operational context, rather than independent of it.

Dimensional designers, for example, are often trained to record information at the lowest level of detail possible. The reasons for this guideline are fully explained in Chapter 3, along with situations where these reasons might not apply. Similarly, designers are always taught that different business processes deserve their own models, or stars. Chapter 4 explains why this is the case, and fully explores what happens when this guideline is relaxed.

Even when you stick to the best practices, there is no single right way to model a particular business process. You will learn how each design option strikes a balance among business value, the required effort to construct reports, the complexity of the load process, and cost. Flattening a recursive hierarchy, for example, simplifies reporting and reduces development cost, but limits the power of the final solution; the alternatives are fully explored in Chapter 10. Derived schemas can make reporting easier and improve performance, but provide significant additional work to load data into the data warehouse, as described in Chapter 14.

Architecture-Neutral

This book makes no assumptions about your data warehouse architecture. The best practices outlined in these pages apply whether you follow W.H. Inmon’s Corporate Information Factory approach or Ralph Kimball’s dimensional data warehouse bus approach, or simply build subject-area data marts. In each of these paradigms, there is a place for dimensional data. No matter how you put dimensional design to work, this book will allow you to make the most of it.

If you don’t know anything about these thought leaders or their recommended architectures, you will learn something about them in Chapter 2. There, you will find a high-level overview of various approaches, and information on how dimensional design fits into each. What you won’t find is an argument in favor of one approach over another. This book’s coverage of dimensional design is disentangled from such considerations. Anyone can use it.

Common Vocabulary

This book is designed to service any data warehouse architecture, but it is necessary to establish a common vocabulary. When it comes to dimensional design, that vocabulary comes from Ralph Kimball. By providing a way to talk about dimensional design, he has made a valuable contribution to the world of data warehousing, giving us terms like grain, conformance, and slowly changing dimensions. These and other terms can be found in his seminal work on dimensional design: The Data Warehouse Toolkit, Second Edition, by Ralph Kimball and Margy Ross (Wiley, 2002).

Wherever possible, this book makes use of terminology established by Kimball and Ross. Each term will be fully explained. However, it is not presumed that the reader adheres to Kimball’s approach to data warehousing. His approach is one of several architectures that make use of dimensional design. These architectures are discussed in Chapter 2; the principles in this book can be employed in any of these situations.

Product-Independent

This book makes no assumptions about specific hardware or software products in your data warehouse architecture. The dimensional techniques described are largely universal, and can be implemented using tools and technologies from a variety of vendors.

This is not to say that the software products used by your organization will not influence your dimensional design. To the contrary, they can, will, and should bear such influence. Although specific software products will not be discussed, the influence of various kinds of tools will be explored. These include database management systems (DBMSs), reporting or business intelligence (BI) software, and data integration or extract transform load (ETL) tools.

The capabilities of your RDBMS and reporting tools, for example, may drive the decision to produce a snowflake design, rather than a star, as you will learn in Chapter 7. The capabilities of a business intelligence tool, or the sophistication of its users, may shape your approach to schema design issues outlined in Chapter 16. Development of the ETL process is complex, and may benefit from some design considerations discussed in Chapter 17.

Snowflakes and Cubes

Most of the examples in this book feature the star schema. The principles of dimensional modeling can also be used to design snowflakes and cubes. The best practices are largely the same, with a few exceptions that are highlighted and explored. The snowflake is featured in Chapter 7; the influence of business intelligence tools on this design option are discussed in Chapter 16. The cube is introduced in Chapter 3; many useful ways to pair stars with cubes are explored in Chapters 14, 15, and 16.

Who Should Read This Book

This book is written for you, the data warehouse practitioner. If your work in any way involves stars, snowflakes, or cubes, then this is your guide to all things dimensional. No assumptions are made regarding your skill level, role, or preferred architecture.

You may design dimensional models, work with dimensional data, manage activities, or pay the bills. Your role may fall into a variety of categories, including:

• Business Analysis

• Data Architecture / Star Schema Design

• Business Intelligence and Reporting

• Data Integration or ETL

• Database Administration

• Quality Assurance

• Data Administration

• Project Management

• Executive Leadership / IT Management

• Power User

It will be assumed that you have a basic familiarity with relational database concepts like tables, columns, and joins. There will be occasional examples of SQL code; these will be fully explained for the benefit of novice readers.

No assumptions are made about your level of experience. If you are new to dimensional design, you will probably want to read this book from cover to cover. Experienced practitioners may prefer to skip directly to areas of particular interest. The next section provides advice on how to proceed.

Using This Book

This book is designed to be used in two ways. You can read it cover to cover, or consult it as a reference. The book is divided into six parts, with chapters organized in a logical progression. When looking for a particular topic, you can scan the chapter and section headings to find what you need. Key features of each chapter help highlight important concepts and aid your understanding of them.

Key Features of Each Chapter

Each chapter in this book provides in-depth treatment of a core topic in dimensional design. Design techniques are evaluated, best practices are identified, and alternatives are fully explored. When you finish a chapter, you will understand the design principles, the reasons behind best practices, and how to evaluate possible design alternatives.

Some special elements are used to help highlight important information.

Tips

Tips are used to highlight best practices:

TIP Whether reading the book cover to cover, or skipping around to pertinent passages, tips will be there to highlight the key lessons of each section.

These tips are like conclusions; they result from a full exploration of design alternatives. Reading the body of the chapter will bring to light the reasons behind these guidelines, along with situations in which you may wish to deviate.

Examples

Detailed examples are used to illustrate every design technique. Most examples are accompanied with schema designs, as well as instance diagrams that illustrate the content of important tables. These diagrams will help you understand design techniques; the text will refer to them, call out important features, highlight the sample data, and show how the technique works to solve the design challenge.

Notes

Some examples may feature design elements that are not central to the focus of the topic or section. Interesting but off-topic considerations are highlighted in Notes, which may also direct you to other chapters where the topic is fully explored.

NOTE Notes are used to alert you to additional considerations dealt with elsewhere in the book, or to touch on topics not central to dimensional design.

Contents of This Book

Star Schema: The Complete Reference is divided into six parts, each of which focuses on a major category of dimensional design techniques. A summary of each section follows. For additional details, you may wish to scan the Table of Contents.

Part I: Fundamentals

Part I focuses on the fundamentals of dimensional design. It includes chapters that focus on process measurement, data warehouse architecture, and star schema design.

Chapter 1: Analytic Databases and Dimensional Design The fundamentals of process measurement are introduced in this chapter, including facts, dimensions, and the star schema.

Chapter 2: Data Warehouse Architectures Three very different architectures make use of the star schema, including those advocated by W.H. Inmon and Ralph Kimball. This chapter sorts through each architecture’s use of the star, and highlights how the same terms take on different meanings in each paradigm.

Chapter 3: Stars and Cubes In this chapter, you will learn the fundamentals of star schema design and slowly changing dimensions, and explore the different ways cubes may be incorporated into a data warehouse architecture.

Part II: Multiple Stars

Part II takes the first steps out of the neat and perfect world of the simple example and ventures into the real world of complex designs. It deals with a fundamental challenge that novice designers must learn to tackle: modeling different business processes as different stars.

Chapter 4: A Fact Table for Each Process This chapter teaches you how to identify discrete processes and provide separate stars for each. It also looks at how to produce analysis that crosses process boundaries.

Chapter 5: Conformed Dimensions The concept of conformed dimensions allows you to support and compare a variety of business processes, ensuring compatibility even if implementations make use of different technologies. Dimensional conformance is often considered to be of strategic importance, and can serve as the basis of a roadmap for incremental implementation.

Part III: Dimension Design

Part III dives deeply into advanced techniques that surround the dimensions of a dimensional design. It is divided into five chapters.

Chapter 6: More on Dimension Tables In this chapter, you will learn how to determine what dimensions to place in the same table, how to stem unmanageable growth in dimension tables, and how to handle information that is optional or unavailable.

Chapter 7: Hierarchies and Snowflakes This chapter explores the technique known as snowflaking, and explains how modeling attribute hierarchies may facilitate the implementation of reporting tools.

Chapter 8: More Slow Change Techniques This chapter goes beyond the basic type 1 and type 2 slow changes presented in Chapter 3, covering type 3 slow changes, time-stamping techniques, and hybrid slow change responses.

Chapter 9: Multi-Valued Dimensions and Bridges Sometimes, a dimension can take on multiple values with respect to a single fact, such as multiple salespeople collaborating on a single order. This chapter explores techniques for dealing with these situations, from simple flattening to the use of bridge tables.

Chapter 10: Recursive Hierarchies and Bridges Dimensions often embody recursive hierarchies, such as departments that report to other departments. This chapter shows how to flatten these hierarchies for a simple solution, and how to make use of a hierarchy bridge for powerful and flexible analysis.

Part IV: Fact Table Design

Part IV provides in-depth treatment of advanced features that center on fact tables. It is composed of three chapters.

Chapter 11: Transactions, Snapshots, and Accumulating Snapshots This chapter covers situations in which a standard transaction-based fact table falls short, and shows how periodic snapshots and accumulating snapshots can be put to use.

Chapter 12: Factless Fact Tables Sometimes fact tables contain no facts. In this chapter, you will learn about transaction-based factless fact tables, as well as factless fact tables that capture coverage or conditions.

Chapter 13: Type-Specific Stars This chapter looks at situations where subsets of data have different facts and dimensions, and shows how to make use of core and custom stars for a single process.

Part V: Performance

Any dimensional schema can be supplemented with additional structures that are intended to improve performance or simplify the reporting process. Part V looks at two kinds of supplemental data structures that support high performance.

Chapter 14: Derived Schemas This chapter teaches you to use derived schemas to provide enhanced performance, simplify report development, or address specific departmental needs.

Chapter 15: Aggregates This chapter shows how summary tables, or aggregates, can provide powerful performance boosts when implemented wisely.

Part VI: Tools and Documentation

The last part of this book looks at additional topics of interest for the developers of dimensional models.

Chapter 16: Design and Business Intelligence This chapter explores the influence of business intelligence (BI) tools on your dimensional design. It explores some common dimensional features that often strain BI tools, as well as techniques to mitigate any shortcomings.

Chapter 17: Design and ETL This chapter provides an overview of the process of loading the dimensional schema, and highlights numerous model features that can assist ETL developers in optimizing the process.

Chapter 18: How to Design and Document a Dimensional Model This provides standard tasks and deliverables that can be worked into your data warehouse life cycle, regardless of architecture.

Where to Start

As mentioned earlier, you can read this book from cover to cover, or skip directly to sections that deal with a topic of interest. How you make use of it will largely depend on your current skill level.

Beginners Those new to dimensional design should start by reading Part I: Fundamentals. Once you’ve finished that section, you can read the rest in sequence, or skip to any section of particular interest. If you are getting ready to start a project, you may wish to turn to the last chapter: How to Design and Document a Dimensional Model.

Novices If you have some dimensional design experience, you may wish to skip the fundamentals in Part I. However, you are encouraged to review the data warehouse terms described at the end of Chapter 2. Terms like data warehouse, data mart, and source system can take on very different meanings in different architectures; Chapter 2 explains how these terms will be used in this book. If you’ve done most of your learning on the job, you may also want to review Chapter 3 for standard dimensional terminology used in this book—terms such as grain, natural key, and slowly changing dimension are all defined there.

Experts Experienced dimensional modelers can use this book as a reference, consulting it for detailed information on design issues as they arise. A quick scan through the Table of Contents will help you find any dimensional design topic of interest. You may also wish to read the book cover to cover, enhancing your mastery of dimensional design.

Regardless of how you approach the material in these pages, it will enable you to master the principles of dimensional design, unlocking the full potential of your data warehouse.

PART I

Fundamentals

CHAPTER 1

Analytic Databases and Dimensional Design

CHAPTER 2

Data Warehouse Architectures

CHAPTER 3

Stars and Cubes

CHAPTER 1

Analytic Databases and Dimensional Design

This book describes a set of powerful and effective techniques for the design of analytic databases. These techniques unleash business analytics in a very simple way: they model the measurement of business processes.

The dimensional model of a business process is made up of two components: measurements and their context. Known as facts and dimensions, these components are organized into a database design that facilitates a wide variety of analytic usage. Implemented in a relational database, the dimensional model is called a star schema. Implemented in a multidimensional database, it is known as a cube. If any part of your data warehouse includes a star schema or a cube, it leverages dimensional design.

This chapter introduces the basic concepts of dimensional design, which arose in response to the unique requirements of analytic systems. The concept of measurement is defined in terms of facts and dimensions, and translated into a database design, or star schema. This chapter shows how basic interaction with a star schema can support a wide variety of measurement scenarios, and defines two simple guiding principles for the development of dimensional solutions.

Dimensional Design

Information systems fall into two major categories: those that support the execution of business processes and those that support the analysis of business processes. The principles of dimensional design have evolved as a direct response to the unique requirements of analytic systems. The core of every dimensional model is a set of business metrics that captures how a process is evaluated, and a description of the context of every measurement.

Purpose

Analytic systems and operational systems serve fundamentally different purposes. An operational system supports the execution of a business process, while an analytic system supports the evaluation of the process. Their distinct purposes are reflected in contrasting usage profiles, which in turn suggest that different principles will guide their design.

Operational Systems

An operational system directly supports the execution of a business process. By capturing details about significant events or transactions, it constructs a record of activities. A sales system, for example, captures information about orders, shipments, and returns; a human resources system captures information about the hiring and promotion of employees; an accounting system captures information about the management of the financial assets and liabilities of the business. The activities recorded by these systems are sometimes known as transactions. The systems themselves are sometimes called online transaction processing (OLTP) systems, or transaction systems for short.

To facilitate the execution of a business process, operational systems must enable several types of database interaction, including inserts, updates, and deletes. The focus of these interactions is almost always atomic: a specific order, a shipment, a refund. These interactions will be highly predictable in nature. For example, an order entry system must provide for the management of lists of products, customers, and salespeople; the entering of orders; the printing of order summaries, invoices, and packing lists; and the tracking order status.

Because it is focused on process execution, the operational system is likely to update data as things change, and purge or archive data once its operational usefulness has ended. When a customer moves, for example, his or her old address is no longer useful for shipping products or sending invoices, so it is simply overwritten.

Implemented in a relational database, the optimal schema design for an operational system is widely accepted to be one that is in third normal form. The design may be depicted as an entity-relationship model, or ER model. Coupled with appropriate database technology, this design supports high-performance inserting, updating, and deleting of atomic transactions in a consistent and predictable manner. Developers refer to the characteristics of transaction processing as the ACID properties—atomic, consistent, isolated, and durable.

Analytic Systems

While the focus of the operational system is the execution of a business process, the analytic system supports the evaluation of the process. How are orders trending this month versus last? Where does this put us in comparison to our sales goals for the quarter? Is a particular marketing promotion having an impact on sales? Who are our best customers? These questions deal with the measurement of the overall orders process, rather than asking about individual orders.

Interaction with an analytic system takes place exclusively through queries that retrieve data about business processes; information is not created or modified. These queries can involve large numbers of transactions, in contrast to the operational system’s typical focus on individual transactions. Specific questions asked are less predictable, and more likely to change over time. Historic data will remain important to the analytic system long after its operational use has passed. The differences between operational systems and analytic systems are highlighted in Figure 1-1.

The principles of dimensional modeling address the unique requirements of analytic systems. A dimensional design is optimized for queries that may access large volumes of transactions, not just individual transactions. It is not burdened with supporting concurrent, high-performance updates. It supports the maintenance of historic data, even as the operational systems change or delete information.

Figure 1-1 Operational systems vs. analytic systems

Measurement and Context

The founding principle of dimensional design is disarmingly simple. Dimensional design supports analysis of a business process by modeling how it is measured.

Measurement is easy to discern, whether by listening to people talk or reading a report or chart. Consider the following business questions:

• What are gross margins by product category for January?

• What is the average account balance by education level?

• How many sick days were taken by marketing employees last year?

• What are the outstanding payables by vendor?

• What is the return rate by supplier?

Each of these questions centers on a business process: sales, account management, attendance, payables, return processing. These process-centric questions do not focus on individual activities or transactions. To answer them, it is necessary to look at a group of transactions.

Most importantly, each of these questions reveals something about how its respective business process is measured. The study of sales involves the measurement of gross margin. Financial institutions measure account balance. In human resources, they measure number of absences. The finance department measures payables. Purchasing managers watch the return quantities.

Without some kind of context, a measurement is meaningless. If you are told sales were $10,000, there is not much you can do with this information. Is that sales of a single product, or many products? Does it represent a single transaction, or the company’s total sales from conception to date? Without some context, the measurement is useless.

As with the measurements themselves, context is revealed in business questions or reports. In the preceding questions, for example, gross margin is viewed in the context of product categories and time (the month of January). Sick days are viewed in the context of a department (marketing) and time (last year). Payables are viewed in the context of their status (outstanding) and vendor.

These two simple concepts, measurement and context, are the foundation of dimensional design. Every dimensional solution describes a process by capturing what is measured and the context in which the measurements are evaluated.

Facts and Dimensions

In a dimensional design, measurements are called facts, and context descriptors are called dimensions. Every dimensional design sorts information requirements into these categories. They may be identified within statements or questions, or found within report specifications. Sorting them into categories for facts and dimensions is easy, once you know what to look for.

Facts and Dimensions in Speech

In a spoken or written statement, the word by is almost always followed by a dimension. Consider the question "What are order dollars by product category for January? Clearly, the person asking this question wants a separate measurement for each product category, as indicated by the words by product category." Product category is a dimension.

Similarly, the word for is also a good indicator of the presence of a dimension. In this case, the next word is likely to be an instance value of a dimension. For example, "What are order dollars by product category for January? Here, for is followed by January." January is a value or instance of a dimension. The dimension is not named, but we can infer that it is month.

Facts tend to be numeric in value, and people want to see them at various levels of detail. You can identify facts by looking for things that people want to roll up or break out. Again, look at the question What are order dollars by product category for January? Here, order dollars will presumably be numeric, and they can be rolled up by categories.

Not everything that is numeric is a fact. Sometimes, a numeric data element is really a dimension. The key is to determine how it is being used. Is it something that can be specified at varying levels of detail? If so, it is a fact. Is it providing context? If so, it is a dimension. Show me margin dollars by order number, contains two numeric elements. Margin dollars can be specified at various levels of detail, and the speaker is asking that it be rolled up by order number. Clearly, margin dollars is a fact. What about order number? It too is numeric, but the speaker is not asking that order numbers be added together. Instead, order number is being used to specify the context for margin dollars. Order number is a dimension.

Other examples of numeric data elements behaving as dimensions include size, ages, phone numbers, document numbers, and unit amounts such as unit cost or unit price. The clue that these elements are dimensions lies in their use. The numbers themselves are not rolled up or down according to the question. Instead, they specify the context by which something else may be rolled up or down.

Facts and Dimensions in Reports

Facts and dimensions can also be recognized by considering how they would be used if they were stored in a database and shown on a report. Dimensions serve as filters, or query predicates. The report page shown in Figure 1-2, for example, is filtering detailed data for the western region and January 2009. These filters imply several dimensions: region, month, and year.

Figure 1-2 Sample page from an orders report

In a report, dimensions also serve to specify groupings or break levels, or to identify levels of subtotals. This can be seen in the orders report, where category is used to break up the report into sections; individual categories are also used to label rows containing subtotals. These features are clues that category is a dimension.

Dimensions can also be identified by their use in controlling the aggregation of measurements. In the orders report, several measurements are specified for each product name and SKU. (SKU is a retail term that identifies an individual product, or stock keeping unit.) Product and SKU are dimensions. Notice, too, that dimension values may be used to sort data.

Facts can also be spotted based on their use in queries or reports. Elements that are aggregated, summarized, or subtotaled are facts. Clearly, the report in Figure 1-2 contains the facts cost and order dollars. These numbers are being specified at various levels, including products, categories, and a grand total. Look again, and you may notice another fact: quantity ordered. Although it is not being subtotaled, this number is presumably being aggregated. In this case, it looks like quantity ordered by product is being aggregated from individual orders.

The style of the report in Figure 1-2 is sometimes referred to as tabular. Other reports may cross-reference facts and dimensions in matrix format, or even display facts graphically. The values being plotted are facts; axis values, bar labels, and qualifications are dimensions.

In some cases, it can be hard to tell whether a data element is a fact or a dimension. Sometimes, a single data element can be used in both ways. Chapter 6, More on Dimension Tables, discusses behavioral attributes, which are dimensions that have been derived from facts. Later, Chapter 11, Transactions, Snapshots, and Accumulating Snapshots, looks at techniques used to construct facts out of dimension values. It is also possible to find a process where there does not appear to be a clear measurement. Chapter 12, Factless Fact Tables, describes how the factless fact table can be used to deal with this kind of situation. You also may find that some important business metrics are not so easily rolled up. Chapter 3, Stars and Cubes, considers nonadditive facts, while Chapter 11 discusses the curious case of semi-additive facts.

Grouping Dimensions and Facts

A dimensional design organizes facts and dimensions for storage in a database. It is common for a set of dimensions to share relationships to one another, independent of facts. These are grouped together in a single table to reflect their natural clustering. Similarly, facts that are available at the same level of detail are grouped together.

For example, suppose a study of the orders process reveals the facts and dimensions shown in Figure 1-3. In this example, products are being sold by a distributor to retail stores, so the customers are companies. Each of the facts listed in the column on the left may be rolled up according to the dimension values in the right-hand column.

Notice natural affinities exist among the dimension values. Some of them clearly belong together. For example, product and product description both seem to be associated with SKU. Salesperson name is determined by salesperson ID. Other dimensions are more distantly associated. Products, for example, have brands. Salespeople work in regions. Figure 1-4 shows how the facts and dimensions for the orders process might be grouped together.

Figure 1-3 Facts and dimensions for the orders process

The diagram in Figure 1-4 depicts measurement of the orders process. It identifies the important measurements for the process and shows the dimensional context in which those measurements are evaluated. The facts and dimensions in this diagram can be combined in a number of ways, answering a wide array of business questions. Take any fact (or facts), add the word by, and then any combination of dimensions. The diagram can be easily understood by a businessperson. Things I measure go in the middle; the ways I look at them go in the surrounding boxes, loosely grouped or categorized.

The dimensional framework for the orders process in Figure 1-4 can easily be translated into a database design, or more specifically a dimensional design. A dimensional design organizes facts and dimensions for storage in a database management system. In a relational database management system (RDBMS), the design is referred to as a star schema. In a multidimensional database (MDB), the design is referred to as a cube. This chapter introduces the star schema; we will return to cubes in Chapter 3.

Figure 1-4 Sorting out facts and dimensions for the orders process

The Star Schema

A dimensional design for a relational database is called a star schema. Related dimensions are grouped as columns in dimension tables, and the facts are stored as columns in a fact table. The star schema gets its name from its appearance: when drawn with the fact table in the center, it looks like a star or asterisk. Figure 1-5 shows a simple star schema, based on the facts and dimensions for the orders process.

Dimension Tables

In a star schema, a dimension table contains columns representing dimensions. As discussed, these columns will provide context for facts. Sometimes, dimension table is simply shortened to dimension. This may give rise to confusion, since that word can also describe the individual columns. For example, when describing the schema shown in Figure 1-5, one might refer to the columns product, product_description, and SKU as dimensions, but it is also common practice to refer to the table itself as the product dimension.

Generally, you can tell from the context whether the word dimension refers to a table or column. When it would not otherwise be clear, this book will use the terms dimension attribute and dimension table to make the distinction.

The dimension tables serve to provide the rich context needed for the study of facts. In queries and reports, the dimensions will be used to specify how facts will be rolled up—their level of aggregation. Dimension values may be used to filter reports. They will be used to provide context for each measurement, usually in the form of textual labels that precede facts on each row of a report. They may also be used to drive master-detail relationships, subtotaling, cross-tabulation, or sorts.

Figure 1-5 A simple star schema for the orders process

Readers with experience designing operational systems will notice that the dimension tables are not in third normal form. At first, many developers find this disconcerting. For example, there may be a strong desire to move brand, brand_code, and brand_manager into a separate table called brand, Remember, though, that a dimensional model serves a different purpose from an ER model. It is not necessary to isolate repeating values in an environment that does not support transaction processing. Designers do occasionally perform additional normalization within dimensions, although they usually avoid doing so. In such cases, the schema is referred to as a snowflake. The additional tables that result are sometimes called outriggers. Situations in which snowflaking may be useful are discussed in Chapter 7, Hierarchies and Snowflakes.

Because the dimensions in a dimension table are often loosely related, it can sometimes be difficult to decide how to group things. For example, should sales region be part of the salesperson dimension, or does it belong as a separate geographical dimension? Issues like this are tackled in Chapter 6.

Keys and History

In a star schema, each dimension table is given a surrogate key. This column is a unique identifier, created exclusively for the data warehouse. Surrogate keys are assigned and maintained as part of the process that loads the star schema. The surrogate key has no intrinsic meaning; it is typically an integer. Surrogate keys are sometimes referred to as warehouse keys. The surrogate key is the primary key of the dimension table.

In this book, surrogate keys will be easily identifiable by the suffix _key in the column name. In Figure 1-5, for example, the surrogate key for the customer dimension is called customer_key, the surrogate key for the salesperson dimension is called salesperson_key, and so forth. Illustrations in this book will always list the surrogate key for a dimension table as its first attribute.

Dimension tables also contain key columns that uniquely identify something in an operational system. Examples in Figure 1-5 include customer_id, sku, and salesperson_id. In the operational systems, these columns identify specific customers, products, and salespeople, respectively. These key columns are referred to as natural keys.

The separation of surrogate keys and natural keys allows the data warehouse to track changes, even if the originating operational system does not. For example, suppose that customer ABC Wholesalers is identified by customer_id 10711 in an operational system. If the customer changes its headquarters location, the operational system may simply overwrite the address for customer_id 10711. For analytic purposes, however, it may be useful to track the history of ABC Wholesalers. Since the star schema does not rely on customer_id to identify a unique row in the customer dimension, it is possible to store multiple versions of ABC Wholesalers, even though both have the same customer_id—10711. The two versions can be distinguished by different surrogate key values. While it would also be possible to support change tracking by supplementing a natural key with a sequence number, the surrogate key allows fact and dimension tables to be joined based on a single column.

The term slowly changing dimension refers to the manner in which a dimensional schema responds to changes in a source system. Detailed examples will be explored in Chapter 3, Stars and Cubes. Advanced techniques are explored in Chapter 8, More Slow Change Techniques.

Fact Tables

At the core of a star schema is the fact table. In addition to presenting the facts, the fact table includes surrogate keys that refer to each of the associated dimension tables. The simple orders star in Figure 1-5, for example, includes the facts quantity_ordered, cost_dollars, and order_dollars. It also includes surrogate keys that refer to products, salespeople, customers, and order dates.

Together, the foreign keys in a fact table are sometimes considered to identify a unique row in the fact table. This is certainly true in Figure 1-5, where each fact table row represents orders of a product sold by a salesperson to a customer on a given day. In other cases, however, the foreign keys in a fact table are not sufficient to identify a unique row. As we will see in Chapter 3, sometimes a fact table row can be uniquely identified by a subset of its foreign keys, or even by using some nonkey attributes.

Each row in the fact table stores facts at a specific level of detail. This level of detail is known as the fact table’s grain, a term that will be explored in Chapter 3. The information held in fact tables may be consumed at a variety of different levels, however, by aggregating the facts. In some data warehouse architectures, it is critical that the star schema capture information at the lowest level of detail possible. In other architectures, this is less important because a separate part of the data warehouse architecture is reserved for atomic data. These variations in data warehouse architecture are explored in Chapter 2, Data Warehouse Architectures.

Using a Star Schema

There is still much to be said about the fundamentals of dimensional design, a discussion that continues in Chapter 3, but enough background has now been provided to look at an important feature of the star schema: how it is actually used. Understanding the basic usage pattern of the star schema allows the dimensional designer to make intelligent choices.

Querying Facts

Most queries against a star schema follow a consistent pattern. One or more facts are requested, along with the dimensional attributes that provide the desired context. The facts will be summarized in accordance with the dimensions present in the query. Dimension values are also used to limit the scope of the query, serving as the basis for filters or constraints on the data to be fetched and aggregated.

A properly configured relational database is well equipped to respond to such a query, which is issued using Structured Query Language (SQL). Suppose that someone has asked to see a report showing order dollars by product category and product name during the month of January 2009. The orders star schema from Figure 1-5 can provide this information, even though order dollars is stored at a lower level of detail. The SQL query in Figure 1-6 produces the required results, summarizing tens of thousands of fact table rows.

The SELECT clause of the query indicates the dimensions that should appear in the query results (category and product), the fact that is requested (order dollars), and the manner in which it will be aggregated (through the SQL SUM() operation). The FROM clause specifies the star schema tables that are involved in the query.

The WHERE clause serves two purposes. First, it filters the query results based on the values of specific dimension columns (month and year). It also specifies the join relationships between tables in the query. In terms of processing time, joins are among the most expensive operations the database must perform; notice that in the case of a star schema, dimension attributes are always a maximum of one join away from facts. The GROUP BY clause specifies the context to which the fact will be aggregated by the relational database; the ORDER

Enjoying the preview?

Page 1 of 1

Star Schema The Complete Reference

About this ebook

Christopher Adamson

Related authors

Related to Star Schema The Complete Reference

Related ebooks

Databases For You

Related podcast episodes

Related articles

Related categories

Reviews for Star Schema The Complete Reference

What did you think?

Book preview

Star Schema The Complete Reference - Christopher Adamson

About the Author

Contents at a Glance

Contents

Acknowledgments

Introduction

About This Book

Organized Around Core Concepts

Comprehensive and Practical, Not Dogmatic

Architecture-Neutral

Product-Independent

Snowflakes and Cubes

Who Should Read This Book

Using This Book

Key Features of Each Chapter

Tips

Examples

Notes

Further Reading

Contents of This Book

Part I: Fundamentals

Part II: Multiple Stars

Part III: Dimension Design

Part IV: Fact Table Design

Part V: Performance

Part VI: Tools and Documentation

Where to Start

CHAPTER 1

Analytic Databases and Dimensional Design

Dimensional Design

Purpose

Measurement and Context

Facts and Dimensions

Grouping Dimensions and Facts

The Star Schema

Dimension Tables

Keys and History

Fact Tables

Using a Star Schema

Querying Facts