Nothing Special   »   [go: up one dir, main page]

Data Warehouse Components

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

2/21/2013

Topics Already Covered


Data Warehouse Architecture and its components

SS G515 - Data Warehousing: Introduction

Extraction,Transformation and Loading (ETL) Data Marts Approached to Design Data Warehouses
Inmons Kimballs

Dr. Yashvardhan Sharma Assistant Professor, CS & IS Dept. BITS-Pilani

A General Architecture for Data Warehousing

A General Architecture for Data Warehousing


The major components of data warehouse architecture are:
Source systems are where the data comes from. Extraction, transformation, and load (ETL) move data between

different data stores.

The central repository is the main store for the data warehouse. The metadata repository describes what is available and where. Data marts provide fast, specialised access for end users and

applications.

Operational feedback integrates decision support back into the

operational systems. place

End-users are the reason for developing the warehouse in the first

MOLAP: Multi-Dimensional On-Line Analytical Processing ROLAP: Relational On-Line Analytical Processing
3 4

Loading the Data Warehouse


Data is periodically extracted Data is cleansed and transformed

Data Warehousing Architecture


Monitoring & Administration
Metadata Repository

OLAP servers

Analysis Query/ Reporting


Serve

External Sources Operational dbs

Extract Transform Load Refresh

Users query the data warehouse Source Systems (OLTP) Data Staging Area Data Warehouse
6

Data Mining

Data Marts

2/21/2013

Data Warehousing Architecture

Data Warehouse Architecture

Data Warehouse COMPONENTS

Data Warehouse COMPONENTS


Source Data Component
Production Data. Internal Data. Archived Data. External Data.

Data Staging Component


Data Extraction Data Transformation. Data Loading.

10

Data Loading

Data Storage Component


Many of the data warehouses also employ multidimensional

database management systems. Data extracted from the data warehouse storage is aggregated in many ways and the summary data is kept in the multidimensional databases (MDDBs). Such multidimensional database systems are usually proprietary products.

11

12

2/21/2013

Information Delivery Component

Metadata Component
Metadata in a data warehouse is similar to a data dictionary,

but much more than a data dictionary.


Types of Metadata
Operational Metadata Extraction and Transformation Metadata End-User Metadata

More Details in Chapter 9.

13

14

Why Meta Data: Special Significance


First, it acts as the glue that connects all parts of the data

The architecture
Operational data source1

warehouse. Next, it provides information about the contents and structures to the developers. Finally, it opens the door to the end-users and makes the contents recognizable in their own terms.

Meta-data Operational data source 2 Lightly summarized data

High summarized data

Query Load Manager

Reporting, query, application development, and EIS(executive information system) Manage tools

Operational data source n

Detailed data

DBMS

OLAP(online analytical processing) tools

Operational data store (ods) Warehouse Manager

Operational data store (ODS)


Data mining

Archive/backup data

End-user access tools

15

Typical architecture of a data warehouse

The main components


Operational data sourcesfor the DW is supplied from mainframe
operational data held in first generation hierarchical and network databases, departmental data held in proprietary file systems, private data held on workstaions and private serves and external systems such as the Internet, commercially available DB, or DB assoicated with and organizations suppliers or customers

The main components


load manageralso called the frontend component, it performs all the operations associated with the extraction and loading of data into the warehouse. These operations include simple transformations of the data to prepare the data for entry into the warehouse warehouse managerperforms all the operations associated with the management of the data in the warehouse. The operations performed by this component include analysis of data to ensure consistency, transformation and merging of source data, creation of indexes and views, generation of denormalizations and aggregations, and archiving and backing-up data

Operational datastore(ODS)is a repository of current and


integrated operational data used for analysis. It is often structured and supplied with data in the same way as the data warehouse, but may in fact simply act as a staging area for data to be moved into the warehouse

2/21/2013

The main components


query manageralso called backend component, it performs all the operations associated with the management of user queries. The operations performed by this component include directing queries to the appropriate tables and scheduling the execution of queries detailed, lightly and lightly summarized data,archive/backup data meta-data end-user access toolscan be categorized into five main groups: data reporting and query tools, application development tools, executive information system (EIS) tools, online analytical processing (OLAP) tools, and data mining tools

Data flows
Inflow- The processes associated with the extraction, cleansing, and

loading of the data from the source systems into the data warehouse.
upflow- The process associated with adding value to the data in the

warehouse through summarizing, packaging , packaging, and distribution of the data


downflow- The processes associated with archiving and backing-up

of data in the warehouse


outflow- The process associated with making the data availabe to the

end-users
Meta-flow- The processes associated with the management of the

meta-data

Tools and Technologies


Operational data source1

Warehouse Manager Meta-flow


Meta-data High summarized data

Reporting, query,application development, and EIS (executive information system) tools

The critical steps in the construction of a data

warehouse:

Inflow Load Manager


Operational data source n Detailed data Lightly summarized data

Outflow OLAP (online analytical processing) tools

Upflow

DBMS

Query Manage

a. Extraction b. Cleansing c. Transformation


after the critical steps, loading the results into target

Warehouse Manager
Operational data store (ods)

Data mining tools Downflow Archive/backup data End-user access tools

system can be carried out either by separate products, or by a single, categories:


code generators database data replication tools dynamic transformation engines

Information flows of a data warehouse

Populating & Refreshing the Warehouse


Data Extraction Data Cleaning Data Transformation

ETL Process : Issues & Challenges


Consumes 70-80% of project time Heterogeneous Source Systems Little or no control over source systems Source systems scattered Source systems operating in different time zones Different currencies Different measurement units Data not captured by OLTP systems Ensuring data quality

Convert from legacy/host format to warehouse format Sort, summarize, consolidate, compute views, check integrity, build indexes, partition Bring new data from source systems

Load

Refresh

2/21/2013

Data Staging Area


A storage area where extracted data is Cleaned Transformed Deduplicated Initial storage for data Need not be based on Relational model Spread over a number of machines Mainly sorting and Sequential processing COBOL or C code running against flat files Does not provide data access to users Analogy kitchen of a restaurant

Presentation Servers
A target physical machine on which DW data is organized for Direct querying by end users using OLAP Report writers Data Visualization tools Data mining tools Data stored in Dimensional framework Analogy Sitting area of a restaurant

Data Cleaning
Why?

Soundex Algorithms
Misspelled terms For example NAMES Phonetic algorithms can find similar sounding names Based on the six phonetic classifications of human speech sounds

Data warehouse contains data that is analyzed for business decisions More data and multiple sources could mean more errors in the data and harder to trace such errors Results in incorrect analysis

Detecting data anomalies and rectifying them early has huge payoffs Long Term Solution

Change business practices and data entry tools Repository for meta-data

Data Warehouse Design


OLTP Systems are Data Capture Systems DATA IN systems DW are DATA OUT systems

Analyzing the DATA


Active Analysis User Queries User-guided data analysis Show me how X varies with Y OLAP Automated Analysis Data Mining Whats in there? Set the computer FREE on your data Supervised Learning (classification) Unsupervised Learning (clustering)

OLTP

DW

2/21/2013

OLAP Queries
How much of product P1 was sold in 2009 state wise? Top 5 selling products in 2010 Total Sales in Q1 of FY 2008-09? Color wise sales figure of cars from 2008 to 2010 Model wise sales of cars for the month of Jan from 2006 to

Data Mining Investigations


Which type of customers are more likely to spend most

with us in the coming year?

What additional products are most likely to be sold to

customers who buy sportswear? year?

In which area should we open a new store in the next What are the characteristics of customers most likely to

2010

default on their loans before the year is out?

Continuum of Analysis
Specialized Algorithms

Data Marts
What is a data mart? Advantages and disadvantages of data marts Issues with the development and management of data marts

SQL

OLTP
Primitive & Canned Analysis

OLAP
Complex Ad-hoc Analysis

Data Mining
Automated Analysis

34

21-Feb-13

Data Marts
A subset of a data warehouse that supports the requirements

Data Marts
Data Mart: A scaled-down version of the data warehouse A data mart is a small warehouse designed for the

of a particular department or business process Data Mart is a subset of corporate-wide data warehouse that is of value to a specific groups of users. Its scope is confined to specific, selected groups, such as marketing data mart. Characteristics include:
Does not always contain detailed data unlike data warehouses More easily understood and navigated Can be dependent or independent

department level.
It is often a way to gain entry and provide an opportunity to

learn
Major problem: if they differ from department to

department, they can be difficult to integrate enterprisewide

35

21-Feb-13

36

2/21/2013

Reasons for Creating Data Marts


Proof of Concept for the DW Can be developed quickly and less resource intensive than DW To give users access to data they need to analyze most often To improve query response time due to reduction in the volume

Kimball vs Inmon
Bill Inmon's paradigm: Data warehouse is one part of

the overall business intelligence system. An enterprise has one data warehouse, and data marts source their information from the data warehouse. In the data warehouse, information is stored in 3rd normal form.

of data to be accessed

Ralph Kimball's paradigm: Data warehouse is the

conglomerate of all data marts within the enterprise. Information is always stored in the dimensional model.

37

21-Feb-13

38

21-Feb-13

Kimball vs Inmon
Bill Inmon: Endorses a Top-Down design

Kimball vs Inmon: War of Words


"...The data warehouse is nothing more than the union of all the data marts...," Ralph Kimball, December 29, 1997. "You can catch all the minnows in the ocean and stack them together and they still do not make a whale," Bill Inmon, January 8, 1998.

Independent data marts cannot comprise an effective EDW. Organizations must focus on building EDW
Ralph Kimball: Endorses a Bottom-Up design

EDW effectively grows up around many of the several independent data marts such as for sales, inventory, or marketing

39

21-Feb-13

40

21-Feb-13

Kimball vs. Inmon


There is no right or wrong between these two ideas, as they represent different data warehousing philosophies. In reality, the data warehouse in most enterprises are closer to Ralph Kimball's idea. This is because most data warehouses started out as a departmental effort, and hence they originated as a data mart. Only when more data marts are built later do they evolve into a data warehouse.

Data Warehousing Process


Enterprise-wide warehouse, top down, the Inmon

methodology
Data mart, bottom up, the Kimball methodology When properly executed, both result in an enterprise-wide

data warehouse

41

21-Feb-13

42

2/21/2013

Data warehouse versus data mart.

Building a Data Warehouse


Questions to be asked:
Top-down or bottom-up approach? Enterprise-wide or departmental? Which firstdata warehouse or data mart? Build pilot or go with a full-fledged implementation? Dependent or independent data marts?

43

44

You might also like