Data Warehouse Components
Data Warehouse Components
Data Warehouse Components
Extraction,Transformation and Loading (ETL) Data Marts Approached to Design Data Warehouses
Inmons Kimballs
The central repository is the main store for the data warehouse. The metadata repository describes what is available and where. Data marts provide fast, specialised access for end users and
applications.
End-users are the reason for developing the warehouse in the first
MOLAP: Multi-Dimensional On-Line Analytical Processing ROLAP: Relational On-Line Analytical Processing
3 4
OLAP servers
Users query the data warehouse Source Systems (OLTP) Data Staging Area Data Warehouse
6
Data Mining
Data Marts
2/21/2013
10
Data Loading
database management systems. Data extracted from the data warehouse storage is aggregated in many ways and the summary data is kept in the multidimensional databases (MDDBs). Such multidimensional database systems are usually proprietary products.
11
12
2/21/2013
Metadata Component
Metadata in a data warehouse is similar to a data dictionary,
13
14
The architecture
Operational data source1
warehouse. Next, it provides information about the contents and structures to the developers. Finally, it opens the door to the end-users and makes the contents recognizable in their own terms.
Reporting, query, application development, and EIS(executive information system) Manage tools
Detailed data
DBMS
Archive/backup data
15
2/21/2013
Data flows
Inflow- The processes associated with the extraction, cleansing, and
loading of the data from the source systems into the data warehouse.
upflow- The process associated with adding value to the data in the
end-users
Meta-flow- The processes associated with the management of the
meta-data
warehouse:
Upflow
DBMS
Query Manage
Warehouse Manager
Operational data store (ods)
Convert from legacy/host format to warehouse format Sort, summarize, consolidate, compute views, check integrity, build indexes, partition Bring new data from source systems
Load
Refresh
2/21/2013
Presentation Servers
A target physical machine on which DW data is organized for Direct querying by end users using OLAP Report writers Data Visualization tools Data mining tools Data stored in Dimensional framework Analogy Sitting area of a restaurant
Data Cleaning
Why?
Soundex Algorithms
Misspelled terms For example NAMES Phonetic algorithms can find similar sounding names Based on the six phonetic classifications of human speech sounds
Data warehouse contains data that is analyzed for business decisions More data and multiple sources could mean more errors in the data and harder to trace such errors Results in incorrect analysis
Detecting data anomalies and rectifying them early has huge payoffs Long Term Solution
Change business practices and data entry tools Repository for meta-data
OLTP
DW
2/21/2013
OLAP Queries
How much of product P1 was sold in 2009 state wise? Top 5 selling products in 2010 Total Sales in Q1 of FY 2008-09? Color wise sales figure of cars from 2008 to 2010 Model wise sales of cars for the month of Jan from 2006 to
In which area should we open a new store in the next What are the characteristics of customers most likely to
2010
Continuum of Analysis
Specialized Algorithms
Data Marts
What is a data mart? Advantages and disadvantages of data marts Issues with the development and management of data marts
SQL
OLTP
Primitive & Canned Analysis
OLAP
Complex Ad-hoc Analysis
Data Mining
Automated Analysis
34
21-Feb-13
Data Marts
A subset of a data warehouse that supports the requirements
Data Marts
Data Mart: A scaled-down version of the data warehouse A data mart is a small warehouse designed for the
of a particular department or business process Data Mart is a subset of corporate-wide data warehouse that is of value to a specific groups of users. Its scope is confined to specific, selected groups, such as marketing data mart. Characteristics include:
Does not always contain detailed data unlike data warehouses More easily understood and navigated Can be dependent or independent
department level.
It is often a way to gain entry and provide an opportunity to
learn
Major problem: if they differ from department to
35
21-Feb-13
36
2/21/2013
Kimball vs Inmon
Bill Inmon's paradigm: Data warehouse is one part of
the overall business intelligence system. An enterprise has one data warehouse, and data marts source their information from the data warehouse. In the data warehouse, information is stored in 3rd normal form.
of data to be accessed
conglomerate of all data marts within the enterprise. Information is always stored in the dimensional model.
37
21-Feb-13
38
21-Feb-13
Kimball vs Inmon
Bill Inmon: Endorses a Top-Down design
Independent data marts cannot comprise an effective EDW. Organizations must focus on building EDW
Ralph Kimball: Endorses a Bottom-Up design
EDW effectively grows up around many of the several independent data marts such as for sales, inventory, or marketing
39
21-Feb-13
40
21-Feb-13
methodology
Data mart, bottom up, the Kimball methodology When properly executed, both result in an enterprise-wide
data warehouse
41
21-Feb-13
42
2/21/2013
43
44