Nothing Special   »   [go: up one dir, main page]

Information System Decision-Making: DSS Architecture

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

ANS 1

A decision support system (DSS) is an information system that supports business or


organizationaldecision-making activities. DSSs serve the management, operations and planning levels
of an organization (usually mid and higher management) and help people make decisions about
problems that may be rapidly changing and not easily specified in advance—i.e. unstructured and
semi-structured decision problems. Decision support systems can be either fully computerized or
human-powered, or a combination of both.
While academics have perceived DSS as a tool to support decision making process, DSS users see
DSS as a tool to facilitate organizational processes.[1] Some authors have extended the definition of
DSS to include any system that might support decision making and some DSS include a decision-
making software component; Sprague (1980)[2] defines a properly termed DSS as follows:
1. DSS tends to be aimed at the less well structured, underspecified problem that upper
levelmanagers typically face;
2. DSS attempts to combine the use of models or analytic techniques with traditional data
accessand retrieval functions;
3. DSS specifically focuses on features which make them easy to use by non-computer-proficient
people in an interactive mode; and
4. DSS emphasizes flexibility and adaptability to accommodate changes in the environment and
the decision makingapproach of the user.
DSSs include knowledge-based systems. A properly designed DSS is an interactive software-based
system intended to help decision makers compile useful information from a combination of raw data,
documents, and personal knowledge, or business models to identify and solve problems and make
decisions.
Typical information that a decision support application might gather and present includes:
 inventories of information assets (including legacy and relational data sources, cubes, data
warehouses, and data marts),
 comparative sales figures between one period and the next,
 projected revenue figures based on product sales assumptions.
DSS Architecture and Types
There are four fundamental components of DSS architecture:
 User Interface
 Database
 Model (context or situation representation)
 Knowledge

 User Interface
In the previous article, we learnt what it takes to design and build an effective user interface
design. Since it’s a full-fledged subject of study, we looked at the prerequisites of a good user
interface design, concerns pertaining to dialogue development, flexibility, mode of feeding
information, interface design rules and factors influencing the success of a user interface design.
 The Database
Next, comes the database. It serves as the storehouse of information. It contains:
i. Personal/internal information – details collected from within the organization, from
employees, customers. It may include ideas, your own thoughts, experiences and
insights.
ii. External information – information collected from outside sources, like independent
researches, internet, government organizations, etc.

A DSS accesses information directly from the database, depending upon your needs and type of
decision you are making. A decision support system architecture scheme focuses on
iii. Type of database required for a particular decision making system model
iv. Who’s responsible for different types of databases
v. How to maintain accuracy and security of database
 Model
This component of DSS architecture takes care of:
i. DSS model and
ii. DSS model management system

While a model is a representation of context, a situation or an event, a DSS model management


system stores and maintains DSS models.
A model makes an important component of DSS architecture because it allows you to carry out a
particular type of data analysis that you need for a particular kind of decision-making. For
example, you need to understand what happens if you change a particular variable. A
spreadsheet-based model will help you conduct what-if analysis.
 Knowledge
This element of DSS architecture provides information about the relationship among data, which is
too complex. It manages the knowledge and provides decision makers with alternative solutions
of a problem. It also sends signals to decision makers when there is any mismatch between
forecasted and actual results.

ANS 2
Multinational organizations have faced the problem of reporting and analysis across regions and
countries for decades. Combining data from different cultures, jurisdictions, time zones, alphabets, and
platforms is always highly challenging. Advanced technologies are helping to make these tasks easier
than ever before, but even today’s architects often must respond to pressures from the line of business
to allow reporting that spans multiple countries. However, there are ways to address these challenges.
Many architects facing these problems often ask the following questions:

 What are the best practices for building data warehouses for multinational organizations?
 Can we act as one company with a very diverse customer base?
 How can we create regional hubs with common development and support?
 Will multinational data warehouses achieve significant improvements in information governance and
data quality?
 How can a single best practice solution be enforced across regional hubs?
 Is building a single data model and using it in plug-and-play fashion possible?

Before designing a data warehouse for any multinational organization, architects need to ensure there
is real business value for creating one in the first place. Reporting across all regions may seem like a
good idea, but in reality many decisions are made locally, except for decisions that impact major
product development and strategy.

Advantages of analysis shared across regions


For example, one organization has significant operations in 10 countries and has strict, centralized
control over billing, marketing, and branding. This company wants to measure the effectiveness of
marketing campaigns and product launches across resellers and countries. Its goal is to spot the most
successful campaigns and share them across other regions. While this business goal is valid, is the
value enough to justify developing a multinational data warehouse? This organization thinks so.

A multinational retailer expanding into new countries, in another example, has a hands-off strategy for
each regional business. However, this multinational organization is being affected by disjointed
business practices that make comparing regions difficult. The company is also experiencing shrinkage
disproportionately in certain regions. It views a detailed, multinational data warehouse as a method to
enable tight controls on its regional management.

There is always value in high-level summary analysis shared across regions. This kind of analysis
allows management to determine the overall health of the business and make strategic decisions about
its growth and investment. Financial regulators require some form of consolidation of data, so
consolidation is always necessary. The question is whether detailed analytics across countries is
valuable.
Expanded detailed analysis from a highly sophisticated consolidation of accounts and transactions can
enhance the operations of each regional business unit. Best practices can be discovered and
propagated across units. A product can be hot in one locale and cold in another simply because of
clever marketing campaigns or beneficial commission structures. Without detailed analytics that span
multiple regions, these potentially profitable situations may not be detected and exploited across those
regions. Accurately measuring the value prior to the data warehouse project can be very challenging.

Implications of expansion across country borders


Many multinational organizations grow through acquisition. This kind of growth creates a scenario in
which each region can have different operational and analytical systems. The analytical systems in
each region tend to have different data models that make consolidation a challenge. For example,
revenue per customer might be calculated differently in each region because of different accounting
rules among countries or simply because of different preferences by the local management. These
complexities are in addition to the obvious problems of merging data from different languages, number
and date formats, currencies, time zones, character sets, collating sequences, and so on.

While a consolidated data warehouse across regions has some advantages, a fundamental question
should be asked: What value is there in comparing detail data from one region to another? Each region
may have different rate plans, product mixes, legal requirements, cultural and regional preferences,
and languages. So comparing regions might not have as much business value as it appears on the
surface. For example, in the telecommunications industry, some regions may prefer prepay phones
while other regions may prefer long-term contracts for smartphones. The ratio can be as much as five
prepay phones to one contract phone in one country, and the reverse in other countries, making
country-to-country comparisons less valuable.

ANS 3
Extraction is the operation of extracting data from a source system for further use in a data warehouse
environment. This is the first step of the ETL process. After the extraction, this data can be transformed and
loaded into the data warehouse.

The source systems for a data warehouse are typically transaction processing applications. For example, one of
the source systems for a sales analysis data warehouse might be an order entry system that records all of the
current order activities.

Designing and creating the extraction process is often one of the most time-consuming tasks in the ETL process
and, indeed, in the entire data warehousing process. The source systems might be very complex and poorly
documented, and thus determining which data needs to be extracted can be difficult. The data has to be extracted
normally not only once, but several times in a periodic manner to supply all changed data to the warehouse and
keep it up-to-date. Moreover, the source system typically cannot be modified, nor can its performance or
availability be adjusted, to accommodate the needs of the data warehouse extraction process.

These are important considerations for extraction and ETL in general. This chapter, however, focuses on the
technical considerations of having different kinds of sources and extraction methods. It assumes that the data
warehouse team has already identified the data that will be extracted, and discusses common techniques used for
extracting data from source databases.

Designing this process means making decisions about the following two main aspects:

 Which extraction method do I choose?

This influences the source system, the transportation process, and the time needed for refreshing the
warehouse.

 How do I provide the extracted data for further processing?

This influences the transportation method, and the need for cleaning and transforming the data.
Extraction Methods in Data Warehouses
The extraction method you should choose is highly dependent on the source system and also from the business
needs in the target data warehouse environment. Very often, there's no possibility to add additional logic to the
source systems to enhance an incremental extraction of data due to the performance or the increased workload of
these systems. Sometimes even the customer is not allowed to add anything to an out-of-the-box application
system.

The estimated amount of the data to be extracted and the stage in the ETL process (initial load or maintenance of
data) may also impact the decision of how to extract, from a logical and a physical perspective. Basically, you
have to decide how to extract data logically and physically.

Transformation
data transformation is the process of converting data from one format or structure into another format
or structure. It is a fundamental aspect of most data integration[1] and data management tasks such
as data wrangling,data warehousing, data integration and application integration.
Data transformation can be simple or complex based on the required changes to the data between the
source (initial) data and the target (final) data. Data transformation is typically performed via a mixture
of manual and automated steps.[2] Tools and technologies used for data transformation can vary widely
based on the format, structure, complexity, and volume of the data being transformed.
A master data recast is another form of data transformation where the entire database of data values is
transformed or recast without extracting the data from the database. All data in a well designed
database is directly or indirectly related to a limited set of master database tables by a network
of foreign key constraints. Each foreign key constraint is dependent upon a unique database index from
the parent database table. Therefore, when the proper master database table is recast with a different
unique index, the directly and indirectly related data are also recast or restated. The directly and
indirectly related data may also still be viewed in the original form since the original unique index still
exists with the master data. Also, the database recast must be done in such a way as to not impact
the applications architecture software.
When the data mapping is indirect via a mediating data model, the process is also called data
mediation.

LOADING
Data can come in various formats. It can be extracted from the source database directly
or it may be loaded from the files. When we extract data directly all we need to is to
check if connection is working.
This usually done automatically by ETL automation tool. In case of files, we need to obtain
the files first. They might be stored on the remote ftp server or somewhere in the web.
Those files have to be copied to the location where the can be accessed by the ETL tool.

Extraction.
SQL can be used to extract the data from the database or text parser to extract the data
from fixed width or delimited files.
Extracting data often involves the transfer of large amounts of data from source
operational systems. Such operations can impose significant processing loads on the
databases involved and should be performed during a period of relatively low system load
or overnight.

Cleansing and Transforming data.


Data cleansing is a process of checking data against predefined set of rules
For example:
 Checking date formats
 Checking field length
 Pattern validation
 Data type checks
A lot of data transformations can be performed during the process of extracting data from
the source systems. However, there are often some additional tasks to execute before
loading the data into the data warehouse. For example, reconciling inconsistent data from
heterogeneous data sources after extraction and completing other formatting and
cleansing tasks and generating surrogate keys.

ANS 5
Data mining is the computing process of discovering patterns in large data sets involving methods at
the intersection ofmachine learning, statistics, and database systems.[1] It is an essential process where
intelligent methods are applied to extract data patterns.[1][2] It is an interdisciplinary subfield of computer
science.[1][3][4] The overall goal of the data mining process is to extract information from a data set and
transform it into an understandable structure for further use.[1]Aside from the raw analysis step, it
involves database and data management aspects, data pre-
processing, model andinference considerations, interestingness metrics, complexity considerations,
post-processing of discovered structures,visualization, and online updating.[1] Data mining is the
analysis step of the "knowledge discovery in databases" process, or KDD. [5]
The actual data mining task is the semi-automatic or automatic analysis of large quantities of data to
extract previously unknown, interesting patterns such as groups of data records (cluster analysis),
unusual records (anomaly detection), and dependencies (association rule mining, sequential pattern
mining). This usually involves using database techniques such as spatial indices. These patterns can
then be seen as a kind of summary of the input data, and may be used in further analysis or, for
example, in machine learning and predictive analytics. For example, the data mining step might identify
multiple groups in the data, which can then be used to obtain more accurate prediction results by
a decision support system. Neither the data collection, data preparation, nor result interpretation and
reporting is part of the data mining step, but do belong to the overall KDD process as additional steps.
The related terms data dredging, data fishing, and data snooping refer to the use of data mining
methods to sample parts of a larger population data set that are (or may be) too small for reliable
statistical inferences to be made about the validity of any patterns discovered. These methods can,
however, be used in creating new hypotheses to test against the larger data populations.
Knowledge Discovery
Some people don’t differentiate data mining from knowledge discoverywhile others
view data mining as an essential step in the process ofknowledge discovery. Here is the
list of steps involved in the knowledge discovery process −
 Data Cleaning − In this step, the noise and inconsistent data is removed.
 Data Integration − In this step, multiple data sources are combined.
 Data Selection − In this step, data relevant to the analysis task are retrieved from
the database.
 Data Transformation − In this step, data is transformed or consolidated into
forms appropriate for mining by performing summary or aggregation operations.
 Data Mining − In this step, intelligent methods are applied in order to extract data
patterns.
 Pattern Evaluation − In this step, data patterns are evaluated.
 Knowledge Presentation − In this step, knowledge is represented.
Clustering
Clustering is the process of making a group of abstract objects into classes of similar
objects.
Points to Remember
 A cluster of data objects can be treated as one group.
 While doing cluster analysis, we first partition the set of data into groups based on
data similarity and then assign the labels to the groups.
 The main advantage of clustering over classification is that, it is adaptable to
changes and helps single out useful features that distinguish different groups.
classification
Classification models predict categorical class labels; and predictionmodels predict
continuous valued functions. For example, we can build aclassification model to
categorize bank loan applications as either safe or risky, or a prediction model to predict
the expenditures in dollars of potential customers on computer equipment given their
income and occupation.

Following are the examples of cases where the data analysis task isClassification −
 A bank loan officer wants to analyze the data in order to know which customer
(loan applicant) are risky or which are safe.
 A marketing manager at a company needs to analyze a customer with a given
profile, who will buy a new computer.
In both of the above examples, a model or classifier is constructed to predict the
categorical labels. These labels are risky or safe for loan application data and yes or no
for marketing data.
ANS 6
Online analytical processing, or OLAP is an approach to answering multi-dimensional
analytical (MDA) queries swiftly in computing.[1] OLAP is part of the broader category of business
intelligence, which also encompasses relational database, report writing and data mining.[2] Typical
applications of OLAP includebusiness reporting for sales, marketing, management reporting, business
process management (BPM),[3] budgeting and forecasting, financial reporting and similar areas, with
new applications coming up, such as agriculture.[4] The term OLAP was created as a slight modification
of the traditional database term online transaction processing (OLTP).[5]
OLAP tools enable users to analyze multidimensional data interactively from multiple perspectives.
OLAP consists of three basic analytical operations: consolidation (roll-up), drill-down, and slicing and
dicing.[6] Consolidation involves the aggregation of data that can be accumulated and computed in one
or more dimensions. For example, all sales offices are rolled up to the sales department or sales
division to anticipate sales trends. By contrast, the drill-down is a technique that allows users to
navigate through the details. For instance, users can view the sales by individual products that make up
a region's sales. Slicing and dicing is a feature whereby users can take out (slicing) a specific set of
data of the OLAP cube and view (dicing) the slices from different viewpoints. These viewpoints are
sometimes called dimensions (such as looking at the same sales by salesperson or by date or by
customer or by product or by region, etc.)
Databases configured for OLAP use a multidimensional data model, allowing for complex analytical
and ad hoc queries with a rapid execution time.[7] They borrow aspects of navigational
databases, hierarchical databases and relational databases.
Online Analytical Processing Server (OLAP) is based on the multidimensional data model.
It allows managers, and analysts to get an insight of the information through fast,
consistent, and interactive access to information. This chapter cover the types of OLAP,
operations on OLAP, difference between OLAP, and statistical databases and OLTP.

Types of OLAP Servers


We have four types of OLAP servers −

 Relational OLAP (ROLAP)


 Multidimensional OLAP (MOLAP)
 Hybrid OLAP (HOLAP)
 Specialized SQL Servers
Relational OLAP
ROLAP servers are placed between relational back-end server and client front-end tools.
To store and manage warehouse data, ROLAP uses relational or extended-relational
DBMS.
ROLAP includes the following −

 Implementation of aggregation navigation logic.


 Optimization for each DBMS back end.
 Additional tools and services.
Multidimensional OLAP
MOLAP uses array-based multidimensional storage engines for multidimensional views of
data. With multidimensional data stores, the storage utilization may be low if the data set
is sparse. Therefore, many MOLAP server use two levels of data storage representation to
handle dense and sparse data sets.
Hybrid OLAP
Hybrid OLAP is a combination of both ROLAP and MOLAP. It offers higher scalability of
ROLAP and faster computation of MOLAP. HOLAP servers allows to store the large data
volumes of detailed information. The aggregations are stored separately in MOLAP store.
Specialized SQL Servers
Specialized SQL servers provide advanced query language and query processing support
for SQL queries over star and snowflake schemas in a read-only environment.
Difference between OLAP and OLTP
Key difference: The Online Analytical Processing is designed to answer multi-dimensional
queries, whereas the Online Transaction Processing is designed to facilitate and manage the
usual business applications. While OLAP is customer-oriented, OLTP is market oriented.
Both OLTP and OLAP are two of the common systems for the management of data. The OLTP is a
category of systems that manages transaction processing. OLAP is a compilation of ways to query
multi-dimensional databases. This article helps to differentiate between the two data systems.
The OLAP stands for ‘Online Analytical Processing’. It is a class of systems which provides answers to
multi-dimensional queries. It manages historical data and stores only the relevant data. It is mainly
characterized by low volume of transactions or data. It is consolidation data and the typical source of
data for an OLAP database is the OLTP databases or the data warehouse.
The OLAP databases are highly de-normalized, which makes the files redundant and helps to improve
analytic performance. The processing speed of the system is very slow and can take up to many hours
depending on the data involved.

You might also like