A data warehouse is a collection of integrated data from multiple sources used to support decision making. It contains subject-oriented, non-volatile data stored separately from transactional systems. Data warehouses arose due to operational systems not being optimized for analysis and reporting. They consolidate information, improve query performance, and separate decision support from transactions. Data is extracted, transformed and loaded from source systems into a data warehouse using an ETL process to integrate and organize the data and support analysis tools.
A data warehouse is a collection of integrated data from multiple sources used to support decision making. It contains subject-oriented, non-volatile data stored separately from transactional systems. Data warehouses arose due to operational systems not being optimized for analysis and reporting. They consolidate information, improve query performance, and separate decision support from transactions. Data is extracted, transformed and loaded from source systems into a data warehouse using an ETL process to integrate and organize the data and support analysis tools.
A data warehouse is a collection of integrated data from multiple sources used to support decision making. It contains subject-oriented, non-volatile data stored separately from transactional systems. Data warehouses arose due to operational systems not being optimized for analysis and reporting. They consolidate information, improve query performance, and separate decision support from transactions. Data is extracted, transformed and loaded from source systems into a data warehouse using an ETL process to integrate and organize the data and support analysis tools.
A data warehouse is a collection of integrated data from multiple sources used to support decision making. It contains subject-oriented, non-volatile data stored separately from transactional systems. Data warehouses arose due to operational systems not being optimized for analysis and reporting. They consolidate information, improve query performance, and separate decision support from transactions. Data is extracted, transformed and loaded from source systems into a data warehouse using an ETL process to integrate and organize the data and support analysis tools.
Download as PPTX, PDF, TXT or read online from Scribd
Download as pptx, pdf, or txt
You are on page 1of 28
What is Data Warehouse?
Definition:- A Data Warehouse (DW) is defined as a subject-oriented,
integrated, time-variant, non-volatile collection of data in support of managements decision-making process. OR A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a way, they can understand and use in a business context. A decision support database maintained separately from the organizations operational database
Why the need for data warehouse arose? The processing load of reporting reduced the response time of the operational systems. The database designs of operational systems were not optimized for information analysis and reporting. Most organizations had more than one operational system, so company-wide reporting could not be supported from a single system. Development of reports in operational systems often required writing specific computer programs which was slow and expensive.
We Need Data Warehouses FOR: Consolidation of information resources Improved query performance Separate research and decision support functions from the operational systems An OLTP (on-line transaction processor) or operational system is used to deal with the everyday running of one aspect of an enterprise. OLTP systems are usually designed independently of each other and it is difficult for them to share information. Foundation for data mining, data visualization, advanced reporting and OLAP tools
Characteristics of Data Warehouse Subject oriented. Data are organized based on how the users refer to them. And is organized in such as way that relevant data is clustered together for easy access. Integrated. All inconsistencies regarding naming convention and value representations are removed. Establishment of a common unit of measure for all synonymous data elements from dissimilar database. The database contains data from most or all of an organization's operational applications, and that this data is made consistent.
Nonvolatile. Data are stored in read-only format and do not change over time. Typical activities such as deletes, inserts, and changes that are performed in an operational application environment are completely nonexistent in a DW environment. Only two data operations are ever performed in the DW: data loading and data access. Time variant. Data are not current but normally time series. The changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time. Data warehouse environment Data Warehouse The queryable source of data in the enterprise. It is comprised of the union of all of its constituent data marts. Data Mart A logical subset of the complete data warehouse. Often viewed as a restriction of the data warehouse to a single business process or to a group of related business processes targeted toward a particular business group. Operational Data Store (ODS) A point of integration for operational systems that developed independent of each other. Since an ODS supports day to day operations, it needs to be continually updated.
Generic data warehouse environment The environment for data warehouses and marts includes the following: Source systems that provide data to the warehouse or mart; Data integration technology and processes that are needed to prepare the data for use; Different architectures for storing data in an organization's data warehouse or data marts; Different tools and applications for the variety of users; Metadata, data quality, and governance processes must be in place to ensure that the warehouse or mart meets its purposes.
Data Warehouse Architectures
Data warehouses and their architectures vary depending upon the specifics of an organization's situation. Three common architectures are: Data Warehouse Architecture (Basic) Data Warehouse Architecture (with a Staging Area) Data Warehouse Architecture (with a Staging Area and Data Marts) Data Warehouse Architecture (Basic)
End users directly access data derived from several source systems through the data warehouse. In the figure: The metadata and raw data of a traditional OLTP system is present, as is an additional type of data, summary data. Summaries are very valuable in data warehouses because they pre- compute long operations in advance. For example, a typical data warehouse query is to retrieve something like August sales.
Data Warehouse Architecture (with a Staging Area)
There is need to clean and process your operational data before putting it into the warehouse. This can be done programmatically, although most data warehouses use a staging area instead. A staging area simplifies building summaries and general warehouse management.
Data Warehouse Architecture (with a Staging Area and Data Marts)
Although the architecture is quite common , but warehouse's architecture can be customized for different groups within the organization. This can be done by adding data marts, which are systems designed for a particular line of business. The figure- illustrates an example where purchasing, sales, and inventories are separated. In this example, a financial analyst might want to analyse historical data for purchases and sales. ARCHITECTURE ARCHITECTURE ETL Overview Extraction Transformation Loading ETL To get data out of the source and load it into the data warehouse simply a process of copying data from one database to other Data is extracted from an OLTP database, transformed to match the data warehouse schema and loaded into the data warehouse database Many data warehouses also incorporate data from non-OLTP systems such as text files, legacy systems, and spreadsheets; such data also requires extraction, transformation, and loading When defining ETL for a data warehouse, it is important to think of ETL as a process, not a physical implementation
ETL is often a complex combination of process and technology that consumes a significant portion of the data warehouse development efforts and requires the skills of business analysts, database designers, and application developers It is not a one time event as new data is added to the Data Warehouse periodically monthly, daily, hourly Because ETL is an integral, ongoing, and recurring part of a data warehouse Automated Well documented Easily changeable
The typical extract-transform-load (ETL)-based data warehouse uses staging, data integration, and access layers to house its key functions. The staging layer or staging database stores raw data extracted from each of the disparate source data systems. The integration layer integrates the disparate data sets by transforming the data from the staging layer often storing this transformed data in an operational data store (ODS) database. The integrated data are then moved to yet another database, often called the data warehouse database, where the data is arranged into hierarchical groups often called dimensions and into facts and aggregate facts. The combination of facts and dimensions is sometimes called a star schema. The access layer helps users retrieve data. [4]
This definition of the data warehouse focuses on data storage. The main source of the data is cleaned, transformed, cataloged and made available for use by managers and other business professionals for data mining, online analytical processing, market research and decision support.
However, the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system. Thus, an expanded definition for data warehousing includes business intelligence tools, tools to extract, transform and load data into the repository, and tools to manage and retrieve metadata.
DATA WAREHOUSE SYSTEM IN DU
The main problem addressed by a data warehouse is that, end-users have a difficult time producing ad-hoc or other specialized queries and reports. This is due to several factors: Most of the data is stored in ADABAS, which is difficult for end-users to access. The data stores were designed for transaction processing not ad-hoc reporting. Obtaining the data or a report usually requires waiting for a programmer to either develop the report or provide a customized download program. All of the data may not be consistent as of the same point in time. There may not be enough copies of the data kept for historical reporting in the operational systems. End-users do not have the knowledge of what is kept in the existing data stores Advantages
The data warehouse addresses these factors and provides many advantages to the end-users of the University including: Improved end-user access to a wide variety of University data Increased data consistency Additional documentation of the data Potentially lower computing costs and increased productivity Providing a place to combine related data from separate sources Creation of a computing infrastructure that can support changes in computer systems and business structures Empowering end-users to perform any level of ad-hoc queries or reports without impacting the performance of the operational systems
Student Data
The Student Data Warehouse was the first data warehouse to be developed at WSU. It consists of demographic information about students as well as the courses in which they are enrolled. The warehouse also contains enrollment statistics for each course offering (section). In addition, the data necessary to support this information is also stored in the warehouse. Below is a list of the major classes of data currently in the student data warehouse: Academic Course Academic Degree Conferment Academic Section Address Course Section Snapshot Email Address Snapshot Generation Student Student Center Snapshot Student Certificate Student Course Snapshot Student Course Transcript Student NCATE Endorsement Student Snapshot Student Transcript Supporting Data Data Use
Access to data for departmental and college use only.
Must be for official use only Should not be shared with third parties (even directory information) Personally identifiable information cannot be shared outside the university Reports cannot be shared outside the university until the figures have been checked by Institutional Research to make certain they are consistent with official university figures Published reports must not include personally identifiable information. Access to the Data Warehouse must be protected from unauthorized use.
Departments and colleges should generally access only their students' data.
Mailings should be restricted to those students within the college or department Reports should be restricted to student data within the college or department Institutional Research should be consulted for university-wide studies Enrollment reporting is the domain of Institutional Research and the Registrar's Office.
Departments and colleges should not report enrollment using the Data Warehouse Questions regarding perceived enrollment discrepancies, should be directed to Institutional Research or the Registrar's Office Enrollment and FTE reports should be for internal use and should be considered estimates.