Open navigation menu

Scribd

0% found this document useful (0 votes)

312 views

DW Unit-1 (1) XXXXXXXX

The document discusses data warehousing and OLAP. It defines a data warehouse as a subject-oriented, integrated, time-variant and non-volatile collection of data used for decision making. It distinguishes data warehouses from operational databases by describing how data warehouses integrate historical data from multiple sources for analysis rather than transaction processing. It also describes common OLAP operations like roll-up and drill-down that allow users to aggregate and navigate multidimensional data cubes.

Uploaded by

Dhananjay Jahagirdar

Copyright

© © All Rights Reserved

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

312 views

DW Unit-1 (1) XXXXXXXX

The document discusses data warehousing and OLAP. It defines a data warehouse as a subject-oriented, integrated, time-variant and non-volatile collection of data used for decision making. It distinguishes data warehouses from operational databases by describing how data warehouses integrate historical data from multiple sources for analysis rather than transaction processing. It also describes common OLAP operations like roll-up and drill-down that allow users to aggregate and navigate multidimensional data cubes.

Uploaded by

Dhananjay Jahagirdar

Copyright

© © All Rights Reserved

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 70

Data Warehousing-Unit 1

Overview
 The term "Data Warehouse" was ﬁrst coined by
Bill Inmon in 1990.
 According to Inmon, a data warehouse is a
subject-oriented, integrated, time-variant, and
non-volatile collection of data.
 This data helps analysts to take informed
decisions in an organization.
Data, Data everywhere yet ...

 We can’t ﬁnd the data we need data is scattered over the

network many versions, subtle differences.
 We can’t understand the data we found since, available
data poorly documented.
 We can’t use the data found because, results are
unexpected data needs to be transformed from one form
to other.
 Due to these reasons we need a data which is single,
complete and consistent store of data obtained from a
variety of different sources made available to end users
in a what they can understand and use in a business
context.
 So, the concept of datawarehousing was introduced
since it is a process of transforming data into
information and making it available to users in a timely
enough manner to make a difference.
Understanding a Data Warehouse

Data warehouse refers to a database that is maintained

separately from an organization’s operational databases.
These systems allow for the integration of a variety of
application systems.
They support information processing by providing a solid
platform of consolidated historical data for analysis.
A data warehouse is a database, which is kept separate
from the organization's operational database.
It possesses consolidated historical data, which helps the
organization to analyze its business.
A data warehouse helps executives to organize,
understand, and use their data to take strategic decisions.
Data warehouse systems help in the integration of
diversity of application systems.
Operational vs. Informational Systems

 Operational systems, as their name implies, are the

systems that help the every day operation of the
enterprise.
 These are the backbone systems of any enterprise, and
include order entry, inventory, manufacturing, payroll and
accounting.
 Due to their importance to the organization, operational
systems were almost always the ﬁrst parts of the
enterprise to be computerized.
 They are OLTP system,s run mission critical applications
need to work with stringent performance requirements for
routine tasks used to run a business.
Operational vs. Informational Systems

 Informational systems deal with analyzing data and

making decisions, often major, about how the enterprise
will operate now, and in the future.
 Not only do informational systems have a different focus
from operational ones, they often have a different scope.
 Where operational data needs are normally focused
upon a single area, informational data needs often span
a number of different areas and need large amounts of
related operational data.
Why a Data Warehouse is Separated from
Operational Databases
An operational database is constructed for well-known
tasks and workloads such as searching particular records,
indexing, etc. In contrast, data warehouse queries are often
complex and they present a general form of data.
Operational databases support concurrent processing of
multiple transactions. Concurrency control and recovery
mechanisms are required for operational databases to
ensure robustness and consistency of the database.
An operational database query allows to read and modify
operations, while an OLAP query needs only read only
access of stored data.
An operational database maintains current data. On the
other hand, a data warehouse maintains historical data.
Deﬁnition and Characteristics

A data warehouse is a
• subject-oriented
• Integrated
• time-varying
• non-volatile collection of data that is used primarily in
organizational decision making.
 The four keywords, subject-oriented, integrated, time-
variant, and nonvolatile, distinguish data warehouses
from other data repository systems, such as relational
database systems, transaction processing systems,
and file systems.
Subject-oriented
 A data warehouse is organized around major subjects,
such as customer, supplier, product, and sales.
 Rather than concentrating on the day-to-day
operations and transaction processing of an
organization, a data warehouse focuses on the
modeling and analysis of data for decision makers.
 Hence, data warehouses typically provide a simple and
concise view around particular subject issues by
excluding data that are not useful in the decision
support process.
Integrated:
 A data warehouse is usually constructed by integrating
multiple heterogeneous sources, such as relational
databases, flat files, and on-line transaction records.
 Data cleaning and data integration techniques are
applied to ensure consistency in naming conventions,
encoding structures, attribute measures, and so on..

Time-variant:
 Data are stored to provide information from a
historical perspective (e.g., the past 5–10 years). Every
key structure in the data warehouse contains, either
implicitly or explicitly, an element of time.
Nonvolatile:
 A data warehouse is always a physically separate store
of data transformed from the application data found in
the operational environment.
 Due to this separation, a data warehouse does not
require transaction processing, recovery, and
concurrency control mechanisms.
 It usually requires only two operations in data
accessing: initial loading of data and access of data.
Differences between Operational Database Systems and
DataWarehouses

 The major task of on-line operational database systems is

to perform on-line transaction and query processing.
These systems are called on-line transaction processing
(OLTP) systems.
 They cover most of the day-to-day operations of an
organization, such as purchasing, inventory,
manufacturing, banking, payroll, registration, and
accounting.
 Data warehouse systems, on the other hand, serve users
or knowledge workers in the role of data analysis and
decision making. Such systems can organize and present
data in various formats in order to accommodate the
diverse needs of the different users. These systems are
known as on-line analytical processing (OLAP) systems.
 The major distinguishing features between OLTP and OLAP
are summarized as follows:

Users and system orientation:

 An OLTP system is customer-oriented and is used for
transaction and query processing by clerks, clients, and
information technology professionals.
 An OLAP systemis market-oriented and is used for data
analysis by knowledge workers, including managers,
executives, and analysts.
Data contents:
 An OLTP system manages current data that, typically, are too
detailed to be easily used for decision making. An OLAP
system manages large amounts of historical data, provi
 des facilities for summarization and aggregation, and stores
and manages information at different levels of granularity.
These features make the data easier to use in informed
decision making.
Database design:
 An OLTP system usually adopts an entity-relationship (ER)
data model and an application-oriented database design.
 An OLAP system typically adopts either a star or snowﬂake
model and a subject-oriented database design.

View:
 An OLTP system focuses mainly on the current data within an
enterprise or department, without referring to historical data
or data in different organizations.
 In contrast, an OLAP system often spans multiple versions of
a database schema, due to the evolutionary process of an
organization. OLAP systems also deal with information that
originates from different organizations, integrating information
from many data stores. Because of their huge volume, OLAP
data are stored on multiple storage media.
Access patterns:
 The access patterns of an OLTP system consist mainly of
short, atomic transactions. Such a system requires
concurrency control and recovery mechanisms.
 However, accesses to OLAP systems are mostly read-only
operations (because most data warehouses store historical
rather than up-to-date information), although many could be
complex queries.
OLAP Operations in the Multidimensional Data Model

 In the multidimensional model, data are organized into

multiple dimensions, and each dimension contains
multiple levels of abstraction deﬁned by concept
hierarchies
 This organization provides users with the ﬂexibility to
view data from different perspectives.
 A number of OLAP data cube operations exist to
materialize these different views, allowing interactive
querying and analysis of the data at hand.
Roll-up:

 The roll-up operation (also called the drill-up operation by

some vendors) performs aggregation on a data cube, either
by climbing up a concept hierarchy for a dimension or by
dimension reduction.
 This hierarchy was deﬁned as the total order “street < city <
province or state < country.” The roll-up operation shown
aggregates the data by ascending the location hierarchy
from the level of city to the level of country.
 In other words, rather than grouping the data by city, the
resulting cube groups the data by country.
 When roll-up is performed by dimension reduction, one or
more dimensions are removed from the given cube.
Drill-down

 Drill-down is the reverse of roll-up. It navigates from less

detailed data to more detailed data.
 Drill-down can be realized by either stepping down a concept
hierarchy for a dimension or introducing additional
dimensions. Figure 3.10 shows the result of a drill-down
operation performed on the central cube by stepping down a
concept hierarchy for time deﬁned as “day < month < quarter
< year.”
 Drill-down occurs by descending the time hierarchy from the
level of quarter to the more detailed level of month. The
resulting data cube details the total sales per month rather
than summarizing them by quarter.
Slice and dice

 The slice operation performs a selection on one dimension of

the given cube, resulting in a subcube.
 Figure shows a slice operation where the sales data are
selected from the central cube for the dimension time using
the criterion time = “Q1”
 The dice operation deﬁnes a subcube by performing a
selection on two or more dimensions.
Pivot (rotate)

 Pivot (also called rotate) is a visualization operation that

rotates the data axes in view in order to provide an alternative
presentation of the data.
Steps for the Design and Construction of Data
Warehouses

 To design an effective data warehouse we need to

understand and analyze business needs and construct a
business analysis framework.
 The construction of a large and complex information
system can be viewed as the construction of a large and
complex building, for which the owner, architect, and
builder have different views.
 These views are combined to form a complex
framework that represents the top-down, business-
driven, or owner’s perspective, as well as the bottom-up,
builder-driven, or implementor’s view of the information
system.
 Four different views regarding the design of a data
warehouse must be considered: the top-down view, the data
source view, the data warehouse view, and the business
query view.
 The top-down view allows the selection of the relevant
information necessary for the data warehouse. This
information matches the current and future business needs.
 The data source view exposes the information being
captured, stored, and managed by operational systems. This
information may be documented at various levels of detail
and accuracy, from individual data source tables to
integrated data source tables.
 Data sources are often modeled by traditional data
modeling techniques, such as the entity-relationship model
or CASE (computer-aided software engineering) tools.
 The data warehouse view includes fact tables and dimension
tables. It represents the information that is stored inside the
data warehouse, including pre calculated totals and counts,
as well as information regarding the source, date, and time of
origin, added to provide historical context.
 Finally, the business query view is the perspective of data in
the data warehouse from the viewpoint of the end user.
The warehouse design process consists of the following steps.

 Choose a business process to model, for example, orders,

invoices, shipments, inventory, account administration, sales,
or the general ledger.
 If the business process is organizational and involves multiple
complex object collections, a data warehouse model should
be followed. However, if the process is departmental and
focuses on the analysis of one kind of business process, a
data mart model should be chosen.
 Choose the grain of the business process. The grain is the
fundamental, atomic level of data to be represented in the fact
table for this process, for example, individual transactions,
individual daily snapshots, and so on.
 Choose the dimensions that will apply to each fact table
record. Typical dimensions are time, item, customer, supplier,
warehouse, transaction type, and status.
 Choose the measures that will populate each fact table record.
Typical measures are numeric additive quantities like dollars
sold and units sold.
A Three-Tier Data Warehouse Architecture

 The bottom tier is a warehouse database server that is

almost always a relational database system. Back-end
tools and utilities are used to feed data into the bottom
tier from operational databases or other external sources
(such as customer proﬁle information provided by
external consultants).
 These tools and utilities perform data extraction,
cleaning, and transformation (e.g., to merge similar data
from different sources into a uniﬁed format), as well as
load and refresh functions to update the data warehouse.
 The data are extracted using application program
interfaces known as gateways. A gateway is supported
by the underlying DBMS and allows client programs to
generate SQL code to be executed at a server.
Contd,..

 The middle tier is an OLAP server that is typically

implemented using either a relational OLAP (ROLAP)
model, that is, an extended relational DBMS that maps
operations on multidimensional data to standard
relational operations; or a multidimensional OLAP
(MOLAP) model, that is, a special-purpose server that
directly implements multidimensional data and
operations.
 The top tier is a front-end client layer, which contains
query and reporting tools, analysis tools, and/or data
mining tools (e.g., trend analysis, prediction, and so on).
 From the architecture point of view, there are three data
warehouse models: the enterprise warehouse, the data mart,
and the virtual warehouse.
Enterprise warehouse:
 An enterprise warehouse collects all of the information about
subjects spanning the entire organization. It provides
corporate-wide data integration, usually from one or more
operational systems or external information providers, and is
cross-functional in scope.
 It typically contains detailed data as well as summarized data,
and can range in size from a few gigabytes to hundreds of
gigabytes, terabytes, or beyond. An enterprise data warehouse
may be implemented on traditional mainframes, computer
super servers, or parallel architecture platforms.
 It requires extensive business modeling and may take years to
design and build.
Data mart:

 A data mart contains a subset of corporate-wide data that is of

value to a specific group of users. The scope is confined to
specific selected subjects.
 For example, a marketing data mart may confine its subjects
to customer, item, and sales. The data contained in data marts
tend to be summarized.
 Depending on the source of data, data marts can be
categorized as independent or dependent.
 Independent data marts are sourced fromdata captured from
one or more operational systems or external information
providers, or from data generated locally within a particular
department or geographic area.
 Dependent data marts are sourced directly from enterprise
data warehouses.
Virtual warehouse:

 A virtual warehouse is a set of views over operational

databases. For efﬁcient query processing, only some of the
possible summary views may be materialized.
 A virtual warehouse is easy to build but requires excess
capacity on operational database servers.
Types of OLAP Servers
Relational OLAP (ROLAP) servers:
 These are the intermediate servers that stand in between a
relational back-end server and client front-end tools. They
use a relational or extended-relational DBMS to store and
manage warehouse data, and OLAP middleware to support
missing pieces.
 ROLAP servers include optimization for each DBMS back
end, implementation of aggregation navigation logic, and
additional tools and services.
 ROLAP technology tends to have greater scalability than
MOLAP technology. The DSS server of Microstrategy, for
example, adopts the ROLAP approach.
Multidimensional OLAP (MOLAP) servers:

 These servers support multidimensional views of data through

array-based multidimensional storage engines. They map
multidimensional views directly to data cube array structures.
 The advantage of using a data cube is that it allows fast
indexing to precomputed summarized data. Notice that with
multidimensional data stores, the storage utilization may be
low if the data set is sparse.
 Many MOLAP servers adopt a two-level storage
representation to handle dense and sparse data sets: denser
sub-cubes are identiﬁed and stored as array structures,
whereas sparse sub-cubes employ compression technology
for efﬁcient storage utilization.
Hybrid OLAP (HOLAP) servers:

 The hybrid OLAP approach combines ROLAP and MOLAP

technology, beneﬁting from the greater scalability of ROLAP
and the faster computation of MOLAP.
 For example, a HOLAP server may allow large volumes of
detail data to be stored in a relational database, while
aggregations are kept in a separate MOLAP store.
 The Microsoft SQL Server 2000 supports a hybrid OLAP server.
GUIDELINES FOR DATA WAREHOUSE
IMPLEMENTATION

Implementation steps
Requirements analysis and capacity planning:
 The first step in data warehousing involves defining
enterprise needs, defining architecture, carrying out
capacity planning and selecting the hardware and
software tools.
 This step will involve consulting senior management as
well as the various stakeholders.

Hardware integration:
 Once the hardware and software have been selected, they
need to be put together by integrating the servers, the
storage devices and the client software tools
Modelling:
 Modelling is a major step that involves designing the
warehouse schema and views. This may involve using a
modelling tool if the data warehouse is complex.
Physical modelling:
 For the data warehouse to perform efﬁciently, physical
modelling is required. This involves designing the physical
data warehouse organization, data placement, data
partitioning, deciding on access methods and indexing.
Sources
 The data for the data warehouse is likely to come from a
number of data sources. This step involves identifying and
connecting the sources using gateways, ODBC drives or other
wrappers.
ETL:
 The data from the source systems will need to go through an
ETL process. The step of designing and implementing the ETL
process may involve identifying a suitable ETL tool vendor and
 This may include customizing the tool to suit the needs of the
enterprise.
Populate the data warehouse:
 Once the ETL tools have been agreed upon, testing the tools
will be required, perhaps using a staging area.
 Once everything is working satisfactorily, the ETL tools may be
used in populating the warehouse given the schema and view
deﬁnitions.
User applications:
 For the data warehouse to be useful there must be end-user
applications. This step involves designing and implementing
applications required by the end users.
Roll-out the warehouse and applications:
 Once the data warehouse has been populated and the end-user
applications tested, the warehouse system and the applications
may be rolled out for the user community to use.
Implementation Guidelines

Build incrementally:

 Data warehouses must be built incrementally. Generally it is

recommended that a data mart may ﬁrst be built with one
particular project in mind and once it is implemented a number
of other sections of the enterprise may also wish to implement
similar systems.
 An enterprise data warehouse can then be implemented in an
iterative manner allowing all data marts to extract information
from the data warehouse.
 Data warehouse modelling itself is an iterative methodology as
users become familiar with the technology and are then able to
understand and express their requirements more clearly.
Need a champion:

 A data warehouse project must have a champion who is

willing to carry out considerable research into expected costs
and beneﬁts of the project.
 Data warehousing projects require inputs from many units in
an enterprise and therefore need to be driven by someone
who is capable of interaction with people in the enterprise and
can actively persuade colleagues.
 Without the cooperation of other units, the data model for the
warehouse and the data required to populate the warehouse
may be more complicated than they need to be. Studies have
shown that having a champion can help adoption and success
of data warehousing projects.
Senior management support:

 A data warehouse project must be fully supported by the

senior management. Given the resource intensive nature of
such projects and the time they can take to implement, a
warehouse project calls for a sustained commitment from
senior management.
 This can sometimes be difﬁcult since it may be hard to
quantify the beneﬁts of data warehouse technology and the
managers may consider it a cost without any explicit return
on investment.
 Data warehousing project studies show that top
management support is essential for the success of a data
warehousing project.
Ensure quality:

 The data quality in the source systems is not always high and
often little effort is made to improve data quality in the
source systems. Improved data quality, when recognized by
Corporate strategy:

 A data warehouse project must ﬁt with corporate strategy and

business objectives. The objectives of the project must be
clearly deﬁned before the start of the project.
 Given the importance of senior management support for a
data warehousing project, the ﬁtness of the project with the
corporate strategy is essential.

Business plan:

 The ﬁnancial costs (hardware, software, and peopleware),

expected beneﬁts and a project plan (including an ETL plan)
for a data warehouse project must be clearly outlined and
understood by all stakeholders.
 Without such understanding, rumors about expenditure and
beneﬁts can become the only source of information,
undermining the project.
Training:

 A data warehouse project must not overlook data warehouse

training requirements. For a data warehouse project to be
successful, the users must be trained to use the warehouse and
to understand its capabilities.
 Training of users and professional development of the project
team may also be required since data warehousing is a
complex task and the skills of the project team are critical to
the success of the project.

Adaptability:

 The project should build in adaptability so that changes may be

made to the data warehouse if and when required. Like any
system, a data warehouse will need to change, as needs of an
enterprise change.
 Furthermore, once the data warehouse is operational, new
applications using the data warehouse are almost certain to be
Joint management:

 The project must be managed by both IT and business

professionals in the enterprise. To ensure good
communication with the stakeholders and that the project is
focused on assisting the enterprise’s business, business
professionals must be involved in the project along with
technical professionals.
Data Warehouse Metadata
 Metadata is simply defined as data about data. The data
that are used to represent other data is known as
metadata.
 For example, the index of a book serves as a metadata for
the contents in the book .
 In terms of data warehouse, we can define metadata as
following:
 Metadata is a roadmap to data warehouse.
 Metadata in data warehouse defines the warehouse
objects.
 Metadata acts as a directory. This directory helps the
decision support system to locate the contents of a data
warehouse.
Data Warehouse Metadata
Role Of Metadata

Categories of Metadata
Data Warehouse Metadata
Metadata can be broadly categorized into three categories:
 Business Metadata - It has the data ownership
information, business deﬁnition, and changing policies.
 Technical Metadata - It includes database system names,
table and column names and sizes, data types and
allowed values. Technical metadata also includes
structural information such as primary and foreign key
attributes and indices.
 Operational Metadata - It includes currency of data and
data lineage. Currency of data means whether the data is
active, archived, or purged. Lineage of data means the
history of data migrated and transformation applied on it.
Data Warehouse Metadata
 The Kimball technical system architecture separates the
data and processes comprising the DW/BI system into
the backroom extract, transformation and load (ETL)
environment and the front room presentation area, as
illustrated in the following diagram
Data Warehouse Metadata
Backroom ETL system

 The Kimball Group has identiﬁed 34 subsystems in the ETL

process ﬂow, grouped into four major operations:
 Extracting the data from the sources,
performing cleansing and conforming transformations,
delivering it to the presentation server, and managing the ETL
process and back room environment.
Front room presentation area

 The Kimball Architecture presumes the data utilized by the BI

applications is dimensionally-structured, organized by business
process, atomically-grained (complemented by aggregated
summaries for performance tuning), and tied together by the
enterprise data warehouse bus architecture, as described
earlier on this page.
Data Warehouse Metadata
Front room BI applications

 The front room is the public face of the DW/BI system; it’s
what business users see and work with day-to-day.
 There’s a broad range of BI applications supported by BI
management services in the front room, including ad hoc
queries, standardized reports, dashboards and scorecards,
and more powerful analytic or mining/modeling applications.
Metadata

 Metadata is all the information that deﬁnes and describes the

structures, operations, and contents of the DW/BI system.
 Technical metadata deﬁnes the objects and processes which
comprise the DW/BI system.
 Business metadata describes the data warehouse contents in
user terms, including what data is available, where did it
come from, what does it mean, and how does it relate to other
data. Finally, process metadata describes the warehouse’s
operational results
Characteristics of OLAP
1) Multidimensional Conceptual View

 User-analysts would view an enterprise as

being multidimensional in nature – for example, proﬁts
could be viewed by region, product, time period, or
scenario (such as actual, budget, or forecast).
 Multi-dimensional data models enable more
straightforward and intuitive manipulation of data by
users, including slicing and dicing
•
Characteristics of OLAP
2) Transparency

 When OLAP forms part of the users’ customary

spreadsheet or graphics package, this should be
transparent to the user.
 OLAP should be part of an open systems architecture
which can be embedded in any place desired by the user
without adversely affecting the functionality of the host
tool.
 The user should not be exposed to the source of the
data supplied to the OLAP tool, which may be
homogeneous or heterogeneous.
Characteristics of OLAP
3) Accessibility

 The OLAP tool should be capable of applying its own

logical structure to access heterogeneous sources of
data and perform any conversions necessary to present
a coherent view to the user.
 The tool (and not the user) should be concerned with
where the physical data comes from.
Characteristics of OLAP
4) Consistent reporting performance

 Performance of the OLAP tool should not suffer

signiﬁcantly as the number of dimensions is increased.

5) Client/server architecture

 The server component of OLAP tools should be

sufﬁciently intelligent that the various clients can be
attached with minimum effort. The server should be
capable of mapping and consolidating data between
disparate databases.
Characteristics of OLAP
6) Generic Dimensionality

 Every data dimension should be equivalent in its

structure and operational capabilities.

7) Dynamic sparse matrix handling

 The OLAP server’s physical structure should have

optimal sparse matrix handling.

8) Multi-user support

 OLAP tools must provide concurrent retrieval and

update access, integrity and security.
Characteristics of OLAP
9) Unrestricted cross-dimensional operations

 Computational facilities must allow calculation and

data manipulation across any number
of data dimensions, and must not restrict any
relationship between data cells.

10) Intuitive data manipulation

 Data manipulation inherent in the consolidation path,

such as drilling down or zooming out, should be
accomplished via direct action on the analytical
model’s cells, and not require use of a menu or
multiple trips across the user interface.

.
Characteristics of OLAP
11) Flexible reporting

 Reporting facilities should present information in any

way the user wants to view it.

12) Unlimited Dimensions and aggregation levels.

 The number of data dimensions supported should, to

all intents and purposes, be unlimited.
 Each generic dimensions should enable an essentially
unlimited number of user-defined aggregation levels
within any given consolidation path.
Multidimensional Data Model
 The most popular data model for a data warehouse is a
multidimensional model. Such a model can exist in the
form of a star schema, a snowflake schema, or a fact
constellation schema.
Star schema
 The most common modeling paradigm is the star schema,
in which the data warehouse contains (1) a large central
table (fact table) containing the bulk of the data, with no
redundancy, and (2) a set of smaller attendant tables
(dimension tables), one for each dimension.
 The schema graph resembles a starburst, with the
dimension tables displayed in a radial pattern around the
central fact table.
Multidimensional Data Model
 A star schema for AllElectronics sales is shown in Figure,
Sales are considered along four dimensions, namely, time,
item, branch, and location.
 The schema contains a central fact table for sales that
contains keys to each of the four dimensions, along with
two measures: dollars sold and units sold.
 To minimize the size of the fact table, dimension identifiers
(such as time key and item key) are system-generated
identifiers..
Multidimensional Data Model
Snowflake schema

 The major difference between the snowﬂake and star

schema models is that the dimension tables of the
snowflake model may be kept in normalized form to
reduce redundancies.
 Such a table is easy to maintain and saves storage
space. However, this saving of space is negligible in
comparison to the typical magnitude of the fact table.
 Furthermore, the snowflake structure can reduce the
effectiveness of browsing, since more joins will be
needed to execute a query. Consequently, the system
performance may be adversely impacted. Hence,
although the snowflake schema reduces redundancy, it
is not as popular as the star schema in data warehouse
Multidimensional Data Model

Fact constellation:

 Sophisticated applications may require multiple fact

tables to share dimension tables.
 This kind of schema can be viewed as a collection of
stars, and hence is called a galaxy schema or a fact
constellation.
Data Cube Implementation

1) Pre-compute and store all

 This means that millions of aggregates will need to be

computed and stored.
 Although this is the best solution as far as query response
time is concerned, the solution is impractical since
resources required to compute the aggregates and to store
them will be prohibitively large for a large data cube.
Indexing large amounts of data is also expensive.
2) Pre-compute (and store) none

 This means that the aggregates are computed on the-ﬂy

using the raw data whenever a query is posed.
Data Cube Implementation
 This approach does not require additional space for
storing the cube but the query response time is likely to be
very poor for large data cubes.

3) Pre-compute and store some

 This means that we pre-compute and store the means that

we pre-compute and store the most frequently queried
aggregates and compute others as the need arises.
 Aggregates from the pre-computed aggregates and will be
necessary to access the database (e.g. the data
warehouse) to compute the remaining aggregates
Data Cube Implementation
Efficient Computation of Data Cubes
 At the core of multidimensional data analysis is the
efficient computation of aggregations across many sets of
dimensions. In SQL terms, these aggregations are referred
to as group-by’s.
 Each group-by can be represented by a cuboid, where the
set of group-by’s forms a lattice of cuboids defining a data
cube.
 A data cube is a lattice of cuboids. Suppose that you would
like to create a data cube for AllElectronics sales that
contains the following: city, item, year, and sales in dollars.
You would like to be able to analyze the data, with queries
such as the following:
 “Compute the sum of sales, grouping by city and item.”
 “Compute the sum of sales, grouping by city.”
 “Compute the sum of sales, grouping by item.”
Data Cube Implementation
Efficient Computation of Data Cubes
 The possible group-by’s are the following: (city, item, year),
(city, item), (city, year), (item, year), (city), (item),(year), ()g,
where () means that the group-by is empty (i.e., the
dimensions are not grouped).
 These group-by’s form a lattice of cuboids for the data
cube, as shown in Figure .
 The base cuboid contains all three dimensions, city, item,
and year.
 It can return the total sales for any combination of the
three dimensions. The apex cuboid, or 0-D cuboid, refers to
the case where the group-by is empty.
 It contains the total sum of all sales..”
Data Cube Implementation
Efficient Computation of Data Cubes
Data Cube Implementation
Efficient Computation of Data Cubes

“How many cuboids are there in an n-dimensional data cube?”

➢ The dimension time is usually not explored at only one
conceptual level, such as year, but rather at multiple
conceptual levels, such as in the hierarchy “day < month <
quarter < year”.
➢ For an n-dimensional data cube, the total number of cuboids
that can be generated (including the cuboids generated by
climbing up the hierarchies along each dimension) is:

➢ where Li is the number of levels associated with dimension i.

You might also like

Environmental Science Student Edition PDF
95% (21)
Environmental Science Student Edition PDF
683 pages
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Unit - 1 Introduction To Data Warehousing
No ratings yet
Unit - 1 Introduction To Data Warehousing
57 pages
Pamantasan NG Lungsod NG San Pablo Research and Development Center
No ratings yet
Pamantasan NG Lungsod NG San Pablo Research and Development Center
17 pages
DWM UNIT-I NOTES
No ratings yet
DWM UNIT-I NOTES
9 pages
Unit 1
No ratings yet
Unit 1
99 pages
DWBI Unit-1
No ratings yet
DWBI Unit-1
19 pages
Data Mining UNIT I
No ratings yet
Data Mining UNIT I
11 pages
DWDM Unit 2
No ratings yet
DWDM Unit 2
21 pages
Unit 2
No ratings yet
Unit 2
31 pages
Dwdm Unit-2 Final
No ratings yet
Dwdm Unit-2 Final
21 pages
Introduction To Data Warehouse
No ratings yet
Introduction To Data Warehouse
15 pages
ETL Testing
No ratings yet
ETL Testing
32 pages
Module 1 (2)
No ratings yet
Module 1 (2)
71 pages
Module1-Question Bank With Answers (1) - 2
No ratings yet
Module1-Question Bank With Answers (1) - 2
23 pages
DWDM Unit-2 PDF
No ratings yet
DWDM Unit-2 PDF
149 pages
DWDM Book
No ratings yet
DWDM Book
58 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
31 pages
CST466-M1 - Ktunotes - in
No ratings yet
CST466-M1 - Ktunotes - in
24 pages
Lesson 2. Data Warehouse Basic Concepts
No ratings yet
Lesson 2. Data Warehouse Basic Concepts
18 pages
DATA Science Unit -II Part 1
No ratings yet
DATA Science Unit -II Part 1
20 pages
Module 1
No ratings yet
Module 1
25 pages
What Is Data Warehouse
No ratings yet
What Is Data Warehouse
19 pages
Unit Ii DWDM
No ratings yet
Unit Ii DWDM
10 pages
Data Ware House Concepts
No ratings yet
Data Ware House Concepts
12 pages
Week-2-Data Warehouse and Olap
No ratings yet
Week-2-Data Warehouse and Olap
57 pages
DM Chapter 4
No ratings yet
DM Chapter 4
8 pages
DWDM Lecturenotes PDF
No ratings yet
DWDM Lecturenotes PDF
133 pages
Data Warehousing: Understanding A Data Warehouse
No ratings yet
Data Warehousing: Understanding A Data Warehouse
4 pages
FDS Unit-2
No ratings yet
FDS Unit-2
36 pages
DATA WAREHOUSE Basic Concepts
No ratings yet
DATA WAREHOUSE Basic Concepts
26 pages
Data War Eh Puse
No ratings yet
Data War Eh Puse
51 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
18 pages
Module 1 DMDW
No ratings yet
Module 1 DMDW
64 pages
Dataware Housing Notes
No ratings yet
Dataware Housing Notes
134 pages
DWHDM_22CSE120__MODULE-1
No ratings yet
DWHDM_22CSE120__MODULE-1
45 pages
DMDW1
No ratings yet
DMDW1
13 pages
Module-1: Data Warehousing & Modelling
No ratings yet
Module-1: Data Warehousing & Modelling
13 pages
Chapter 12 - Data Warehousing and Online Analytical Processing
No ratings yet
Chapter 12 - Data Warehousing and Online Analytical Processing
20 pages
CSE 530 - Database Management Systems: Data Warehousing Presentation by Ali Gardezi Prashanth Janardanan Aaron Sheffield
No ratings yet
CSE 530 - Database Management Systems: Data Warehousing Presentation by Ali Gardezi Prashanth Janardanan Aaron Sheffield
69 pages
Introduction To Data Warehousing
No ratings yet
Introduction To Data Warehousing
43 pages
Lecture # 1-2-Intro
No ratings yet
Lecture # 1-2-Intro
55 pages
U1-U5 Consolidated PDF
No ratings yet
U1-U5 Consolidated PDF
222 pages
Data Warehousing and Data Mining 3rd Class Second Course: Dr. Khalil I. Ghathwan
No ratings yet
Data Warehousing and Data Mining 3rd Class Second Course: Dr. Khalil I. Ghathwan
32 pages
Advance Concept in Data Bases Unit-5 by Arun Pratap Singh
100% (1)
Advance Concept in Data Bases Unit-5 by Arun Pratap Singh
82 pages
AISPrE7- Lesson1..
No ratings yet
AISPrE7- Lesson1..
19 pages
Week-2-Data Warehouse and Olap
No ratings yet
Week-2-Data Warehouse and Olap
48 pages
Data Warehousing Concepts JSR
No ratings yet
Data Warehousing Concepts JSR
24 pages
7 Data Warehousing - 1
No ratings yet
7 Data Warehousing - 1
32 pages
Unit2 Datawarehouse
No ratings yet
Unit2 Datawarehouse
38 pages
Module1 Part3
No ratings yet
Module1 Part3
46 pages
Module 3
No ratings yet
Module 3
17 pages
Data Warehousing and OLAP
No ratings yet
Data Warehousing and OLAP
47 pages
Data Warehouse 2
No ratings yet
Data Warehouse 2
33 pages
Data Mining UNIT 2 LECTURE NOTES
No ratings yet
Data Mining UNIT 2 LECTURE NOTES
32 pages
Data Warehouse
No ratings yet
Data Warehouse
97 pages
Faculty of Egineering Data Mining & Warehouseing Lecture-01 Mr. Dhirendra
No ratings yet
Faculty of Egineering Data Mining & Warehouseing Lecture-01 Mr. Dhirendra
12 pages
Data Warehouse
No ratings yet
Data Warehouse
77 pages
DWH Meterial
No ratings yet
DWH Meterial
9 pages
Data Warehousing, Business Analytics and Online Analytical -1 (1)
No ratings yet
Data Warehousing, Business Analytics and Online Analytical -1 (1)
35 pages
Database Management System
From Everand
Database Management System
Manish Soni
No ratings yet
CS735 WT Test 2 QP
No ratings yet
CS735 WT Test 2 QP
1 page
Panel ID 6th Dec (Interview For 3rd Yrs)
No ratings yet
Panel ID 6th Dec (Interview For 3rd Yrs)
2 pages
Refund of Excess Fee Paid During Academic 2020 - 1
No ratings yet
Refund of Excess Fee Paid During Academic 2020 - 1
1 page
Techno-Managerial Round
No ratings yet
Techno-Managerial Round
4 pages
Movi3 Recommender System
No ratings yet
Movi3 Recommender System
15 pages
Syllabus
No ratings yet
Syllabus
80 pages
Smart Car Parking System: August 2016
No ratings yet
Smart Car Parking System: August 2016
7 pages
Python
No ratings yet
Python
13 pages
Web Programming (CS735) : - Pre-Requisite: Java Programming Basics - Course Outcomes
No ratings yet
Web Programming (CS735) : - Pre-Requisite: Java Programming Basics - Course Outcomes
57 pages
Neural Notes
No ratings yet
Neural Notes
15 pages
1 - Big Data
No ratings yet
1 - Big Data
204 pages
2020 - UNIT 2 Chapter 1
No ratings yet
2020 - UNIT 2 Chapter 1
73 pages
3 Unit
No ratings yet
3 Unit
28 pages
DW
No ratings yet
DW
4 pages
JSS Science and Technology: Email Address
No ratings yet
JSS Science and Technology: Email Address
6 pages
Pdotn: P Chok Ydb - (TT Xcfit O) Dokaycd
No ratings yet
Pdotn: P Chok Ydb - (TT Xcfit O) Dokaycd
4 pages
A Project Report On: An Autonomous Institution Affiliated To Visvesvaraya Technological University, Belgaum
No ratings yet
A Project Report On: An Autonomous Institution Affiliated To Visvesvaraya Technological University, Belgaum
64 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
ILP - Appendix C PDF
No ratings yet
ILP - Appendix C PDF
52 pages
mutlithreaded To Increase The Speed of The Appeared Public Class Extends
No ratings yet
mutlithreaded To Increase The Speed of The Appeared Public Class Extends
1 page
Jss Science and Technological University (Formerly SJCE), Mysuru
No ratings yet
Jss Science and Technological University (Formerly SJCE), Mysuru
31 pages
Jss Science and Technology UNIVERSITY (Formerly SJCE), Mysuru
No ratings yet
Jss Science and Technology UNIVERSITY (Formerly SJCE), Mysuru
56 pages
Wampfler Square Rail Festoon
No ratings yet
Wampfler Square Rail Festoon
20 pages
Case 2 5 Reasons To Avoid Cebu Pacific Air
No ratings yet
Case 2 5 Reasons To Avoid Cebu Pacific Air
3 pages
Class 12 Sociology Notes Chapter 1 Studyguide360
No ratings yet
Class 12 Sociology Notes Chapter 1 Studyguide360
6 pages
Employee Satisfaction and Workplace Culture
No ratings yet
Employee Satisfaction and Workplace Culture
10 pages
Malta CBI Brochure 2022 Compressed
No ratings yet
Malta CBI Brochure 2022 Compressed
52 pages
NetBrain Workstation CE Quick Start Guide
No ratings yet
NetBrain Workstation CE Quick Start Guide
42 pages
Get Brutal Design 1st Edition Solomon Zachary PDF ebook with Full Chapters Now
100% (10)
Get Brutal Design 1st Edition Solomon Zachary PDF ebook with Full Chapters Now
28 pages
CBIC Notifies Customs Manual 2025
No ratings yet
CBIC Notifies Customs Manual 2025
2 pages
DB Technology Hoot Circuit Board Replacement
No ratings yet
DB Technology Hoot Circuit Board Replacement
8 pages
1784U2DHP
No ratings yet
1784U2DHP
12 pages
Distress Failure
No ratings yet
Distress Failure
23 pages
Itemized
100% (1)
Itemized
7 pages
NT2-L3-M-000-21-00005 Rev. C
No ratings yet
NT2-L3-M-000-21-00005 Rev. C
48 pages
Student Placement
No ratings yet
Student Placement
14 pages
Part B Unit 2 Reasoning
100% (3)
Part B Unit 2 Reasoning
57 pages
Competitive strategies-performance nexus and the mediating role of enterprise risk management practices: a multi-group analysis for fully fledged Islamic banks and conventional banks with Islamic window in Pakistan
No ratings yet
Competitive strategies-performance nexus and the mediating role of enterprise risk management practices: a multi-group analysis for fully fledged Islamic banks and conventional banks with Islamic window in Pakistan
22 pages
-Drill-Collars-9.500
No ratings yet
-Drill-Collars-9.500
1 page
Chapter 9 Mechanical Properties of Solids
No ratings yet
Chapter 9 Mechanical Properties of Solids
16 pages
Exercise Plan Assignment 2021-23
No ratings yet
Exercise Plan Assignment 2021-23
3 pages
TCS Aptitude Questions Paper With Solved Answers - Students3k
100% (1)
TCS Aptitude Questions Paper With Solved Answers - Students3k
5 pages
What Is Generative AI
No ratings yet
What Is Generative AI
16 pages
Moments Homework
No ratings yet
Moments Homework
2 pages
DLL - Mathematics 5 - Q2 - W5
No ratings yet
DLL - Mathematics 5 - Q2 - W5
5 pages
Exp. 07
No ratings yet
Exp. 07
10 pages
Introduction To Cryptography With Coding Theory. 3rd Edition Lawrence C. Washington & Wade Trappe. 2024 Scribd Download
100% (6)
Introduction To Cryptography With Coding Theory. 3rd Edition Lawrence C. Washington & Wade Trappe. 2024 Scribd Download
62 pages
Ojt Documentation - FINAL
No ratings yet
Ojt Documentation - FINAL
10 pages
ChristmasTreeCat
No ratings yet
ChristmasTreeCat
7 pages
Resume Updated
No ratings yet
Resume Updated
1 page