Data Warehousing Assignment
Data Warehousing Assignment
Data Warehousing Assignment
ON
DATA WAREHOUSING
Large companies have presence in many places each of which may generate a large volume of
data for instance, large retail chains have hundreds or thousands of stores, and where as
insurance companies may have data from thousands of local branches. Large organizations have
a complex internal organization structure, and therefore different data may be present in different
locations or on different operational systems or under different schemes. Setting up queries on
individual sources is both cumbersome and inefficient moreover the sources of data may store
only current data whereas decision makers may need access to part data as well for instance
information about how purchase patterns have changed in the past year could be of great
importance.
Data warehousing overcomes various problems that result from the need to connect large
number of decision support systems to large number of operational systems by providing
a hub for subject based historical, consistent and non volatile information.
By connecting decision sup poet stems and operational systems to a centralized hub the
number of interfaces can be reduced dramatically and information quality can be
guaranteed more effectively.
Several studies point out that organization related issues area among the most critical
success factor for data warehouse project.
Most projects (enterprise data warehousing) project fail for political and organizational
reasons, rather than for technical ones.
As a foundation for developing the organization of data warehousing the concept of data
ownership has to be derived from traditional process-oriented ownership concepts.
Characteristics of Data Warehouse Design
1. Theme-Focused
2. Unified
A data warehouse design unifies and integrates all analogous data from different
databases in a collectively acceptable way using data modeling. It incorporates data from
diverse sources such as relational and non-relational databases, flat files, mainframe,
cloud-based systems, etc. Besides, a data warehouse must maintain consistent
nomenclature, layout, and coding to facilitate effective data analysis.
3. Time Variance
Unlike other operational systems, a data warehouse stores data collected over an
extensive time horizon. The data gathered is identified with specific time duration and
provides insights from the past perspective. Moreover, when data is entered in the
warehouse, it cannot be restructured or altered.
4. Non-volatility
Another important characteristic is non-volatility which means that the preceding data is
not removed when new data is loaded to the data warehouse. Moreover, data is only
readable and can be intermittently refreshed to deliver a complete and updated picture to
the user.
2. Metadata repository
3. Warehouse/database technology
4. Data marts
5. Data query, reporting, analysis, and mining tools
The data source for data warehouse is coming from operational applications. The data entered
into the data warehouse transformed into an integrated structure and format. The transformation
process involves conversion, summarization, filtering and condensation. The data warehouse
must be capable of holding and managing large volumes of data as well as different structure of
data structures over the time.
3. Meta data
It is data about data. It is used for maintaining, managing and using the data warehouse. It
is classified into two:
1. Technical Meta data: It contains information about data warehouse data used by
warehouse designer, administrator to carry out development and management tasks.
It includes,
Subject areas, and info object type including queries, reports, images, video, audio clips etc.
Meta data helps the users to understand content and find the data. Meta data are stored in a
separate data stores which is known as informational directory or Meta data repository which
helps to integrate, maintain and view the contents of the data warehouse.
4. Access tools
Its purpose is to provide info to business users for decision making. There are five
categories:
Data query and reporting tools
Application development tools
Executive info system tools (EIS)
OLAP tools
Data mining tools
Query and reporting tools are used to generate query and report. There are two types
of reporting tools. They are:
Production reporting tool used to generate regular operational reports
Desktop report writer are inexpensive desktop tools designed for end users.
Managed Query tools: used to generate SQL query. It uses Meta layer software in between
users and databases which offers a point-and-click creation of SQL statement. This tool is a
preferred choice of users to perform segment identification, demographic analysis, territory
management and preparation of customer mailing lists etc.
Application development tools: This is a graphical data access environment which integrates
OLAP tools with data warehouse and can be used to access all db systems
OLAP Tools: are used to analyze the data in multi dimensional and complex views. To enable
multidimensional properties it uses MDDB and MRDB where MDDB refers multi dimensional
data base and MRDB refers multi relational data bases.
Data mining tools: are used to discover knowledge from the data warehouse data also can be
used for data visualization and data correction purposes.
5. Data marts
Departmental subsets that focus on selected subjects. They are independent used by
dedicated user group. They are used for rapid delivery of enhanced decision support
functionality to end users. Data mart is used in the following situation:
1. Scalability: A small data mart can grow quickly in multi dimensions. So that while designing
it, the organization has to pay more attention on system scalability, consistency and
manageability issues
2. Data integration