Business Analytics
Business Analytics
Business Analytics
Introduction
First development:
Technological advances, Internet social networks, and
data generated from personal electronic devices,
produce incredible amounts of data for businesses.
Second development:
Ongoing research has resulted in numerous
methodological developments, including:
Cloud computing, the more recent development, is the These decisions define the organization’s overall goals
remote use of hardware and software over the Internet. and aspirations for the future.
Figure 1.1 - Google Trends Graph of Searches on the term Tactical decisions
Analytics Concern how the organization should achieve the goals
and objectives set by its strategy.
Operational decisions
Affect how the firm is run from day to day.
They are the domain of operations managers, who are
the closest to the customer.
Figure 1.1 is a graph generated by Google Trends that displays Decision making can be defined as the following process:
the search volume for the word analytics from 2004 to 2013 1. Identify and define the problem
(projected) on a percentage basis from the peak. 2. Determine the criteria that will be used to evaluate
alternative solutions
The figure clearly illustrates the recent increase in interest in 3. Determine the set of alternative solutions
analytics. 4. Evaluate the alternatives
5. Choose an alternative
Consider the case of the Thoroughbred Running Company Business analytics
(TRC). Historically, TRC had been a catalog-based retail seller Scientific process of transforming data into insight for
of running shoes and apparel. TRC sales revenue grew making better decisions.
quickly as it changed its emphasis from catalog-based sales
to Internet-based sales. Used for data-driven or fact-based decision making,
which is often seen as more objective than other
Recently, TRC decided that it should also establish retail alternatives for decision making.
stores in the malls and downtown areas of major cities. This
is a strategic decision that will take the firm in a new Tools of business analytics can aid decision making by:
direction that it hopes will complement its Internet-based Creating insights from data
strategy. Improving our ability to more accurately forecast for
planning
TRC middle managers will therefore have to make a variety Helping us quantify risk
of tactical decisions in support of this strategic decision, Yielding better alternatives through analysis and
including how many new stores to open this year, where to optimization
open these new stores, how many distribution centers will
be needed to support the new stores, and where to locate A Categorization of Analytical Methods and Models
these distribution centers.
Descriptive analytics
Operations managers in the stores will need to make day-to- It encompasses the set of techniques that describes
day decisions regarding, for instance, how many pairs of what has happened in the past.
each model and size of shoes to order from the distribution
centers and how to schedule their sales personnel. Examples - data queries, reports, descriptive statistics, data
visualization (data dashboards), data-mining
Common approaches to making decisions: techniques, and basic what-if spreadsheet
Tradition models.
Intuition
Rules of thumb Data query - It is a request for information with certain
Using the relevant data available characteristics from a database.
Business Analytics Defined
Data dashboards - Collections of tables, charts, maps, and Data mining
summary statistics that are updated as new data become Used to find patterns or relationships among elements
available. of the data in a large database; often used in predictive
analytics.
Uses of dashboards
To help management monitor specific aspects of the Example:
company’s performance related to their decision-making A large grocery store chain might be interested in developing
responsibilities. a new targeted marketing campaign that offers a discount
coupon on potato chips.
For corporate-level managers, daily data dashboards
might summarize sales by region, current inventory By studying historical point-of-sale data, the store may be
levels, and other company-wide metrics. able to use data mining to predict which customers are the most
likely to respond to an offer on discounted chips by purchasing
Front-line managers may view dashboards that contain higher-margin items such as beer or soft drinks in addition to the
metrics related to staffing levels, local inventory levels, chips, thus increasing the store’s overall revenue.
and short-term sales forecasts.
Simulation
Predictive analytics It involves the use of probability and statistics to
It consists of techniques that use models constructed construct a computer model to study the impact of
from past data to predict the future or ascertain the uncertainty on a decision.
impact of one variable on another.
Example:
Survey data and past purchase behavior may be used to Banks often use simulation to model investment and default
help predict the market share of a new product. risk in order to stress test financial models.
Big Data
Big data
A set of data that cannot be managed, processed, or
analyzed with commonly available software in a reasonable
amount of time.
Big data represents opportunities. As they realize the advantages of these analytic techniques,
It also presents analytical challenges from a processing they often progress to more sophisticated techniques in an
point of view and consequently has itself led to an effort to reap the derived competitive advantage.
increase in the use of analytics.
More companies are hiring data scientists who know Predictive and prescriptive analytics are sometimes therefore
how to process and analyze massive amounts of data. referred to as advanced analytics.
Walmart handles over one million purchase transactions per Types of applications of analytics by application area:
hour. Financial analytics
Facebook processes more than 250 million picture uploads Use of predictive models
per day. To forecast future financial performance
To assess the risk of investment portfolios and
Business Analytics in Practice projects
To construct financial instruments such as
derivatives
The team uses descriptive and predictive analytics to support The models have generated an estimated annual savings of
employee hiring and to track and influence retention. $500 million.
Marketing analytics
Marketing is one of the fastest growing areas for the
application of analytics.
A better understanding of consumer behavior through the
treatment:
Working with the Georgia Institute of Technology,
Memorial Sloan-Kettering Cancer Center developed a real-
time prescriptive model to determine the optimal placement
of radioactive seeds for the treatment of prostate cancer.
Catholic Relief Services (CRS) is the official international Web analytics - It is the analysis of online activity, which includes,
humanitarian agency of the U.S. Catholic community. The CRS but is not limited to, visits to Web sites and social media sites
mission is to provide relief for the victims of both natural and such as Facebook and LinkedIn.
human-made disasters and to help people in need around the
world through its health, educational, and agricultural programs. Leading companies apply descriptive and advanced analytics to
data collected in online experiments to:
CRS uses an analytical spreadsheet model to assist in the Determine the best way to configure Web sites,
allocation of its annual budget based on the impact that its Position ads, and
various relief efforts and programs will have in different Utilize social networks for the promotion of products
countries. and services
Types of Data
Population
All elements of interest
Example:
With the thousands of publicly traded companies in the
United States, tracking and analyzing all of these stocks every day
would be too time consuming and expensive.
Sample
Subset of the population
Example:
The Dow represents a sample of 30 stocks of large public
companies based in the United States, and it is often interpreted
to represent the larger population of all publicly traded
companies.
For
Quantitative data Time series data
Data on which numeric and arithmetic operations, such Data collected over several time periods.
as addition, subtraction, multiplication, and division, can Graphs of time series data are frequently found in business
be performed. and economic publications.
Help analysts understand what happened in the past,
Example: identify trends over time, and project future levels for the
The values for Volume in the Dow data in Table 2.1 can be time series.
summed to calculate a total volume of all shares traded by
companies included in the Dow.
Categorical data
Data on which arithmetic operations cannot be
performed.
Example:
The data in the Industry column in Table 2.1 are categorical -
the number of companies in the Dow that are in the
telecommunications industry can be counted.
Cross-sectional data
Data collected from several entities at the same, or Figure 2.1 - Dow Jones Index Values Since 2002
approximately the same, point in time.
Example: (contd.)
Example:
Cross-sectional data: The data in Table 2.1 are cross-sectional Time series data: Figure 2.1 illustrates that the DJI was near
because they describe the 30 companies that comprise the Dow 10,000 in 2002 and climbed to above 14,000 in 2007. However,
at the same point in time (April 2013). the financial crisis in 2008 led to a significant decline in the DJI to
between 6000 and 7000 by 2009. Since 2009, the DJI has been
generally increasing and topped 14,000 in April 2013.
Sources of data
Example:
If a pharmaceutical firm is interested in conducting an
experiment to learn about how a new drug affects blood
pressure, then blood pressure is the variable of interest in the
study. The dosage level of the new drug is another variable that
is hoped to have a causal effect on blood pressure. To obtain
data about the effect of the new drug, researchers select a
sample of individuals. The dosage level of the new drug is
controlled as different groups of individuals are given different
dosage levels. Before and after the study, data on blood pressure
Figure 2.2 - Customer Opinion Questionnaire used by Chops City
are collected for each group. Statistical analysis of these
Grill Restaurant
experimental data can help determine how the new drug affects
blood pressure.
Example:
Nonexperimental study:
Nonexperimental study or observational study
Figure 2.2 shows a customer opinion questionnaire used by
Make no attempt to control the variables of interest.
Chops City Grill in Naples, Florida.
A survey is perhaps the most common type of observational
Note that the customers who fill out the questionnaire are
study.
asked to provide ratings for 12 variables, including overall
experience, the greeting by hostess, the table visit by the
manager, overall service, and so on.
The response categories of excellent, good, average, fair, and
poor provide categorical data that enable Chops City Grill Table 2.2 - Top 20 Selling Automobiles in United States in March
management to maintain high standards for the restaurant’s 2011
food and service. In some cases, the data needed for a particular Figure 2.3 - Top 20 Selling Automobiles Data entered into Excel
application already exist from an experimental or observational with Percent Change in Sales from 2010
study already conducted. Companies maintain a variety of
databases about their employees, customers, and business Figure 2.3 shows the data from Table 2.2 entered into an
operations. Excel spreadsheet, and the percent change in sales for each
model from March 2010 to March 2011 has been calculated.
Modifying Data in Excel This is done by entering the formula = (D2-E2)/E2 in cell F2
and then copying the contents of this cell to cells F3 to F20.
Sorting and filtering data in excel
Illustration - To sort the automobiles by March 2010 sales
Step 1: Select cells A1:F21
Step 2: Click the DATA tab in the Ribbon
Step 3: Click Sort in the Sort & Filter group
Step 4: Select the check box for My data has headers
Step 5: In the first Sort by dropdown menu, select Sales (March
2010)
Step 6: In the Order dropdown menu, select Largest to Smallest
Step 7: Click OK
accordingly.
Sorting and filtering data in excel
Illustration - Using Excel’s Filter function to see the sales of
models made by Toyota.
Figure 2.4 - Using Excel’s Sort Function to Sort the Top Selling Step 1: Select cells A1:F21
Automobiles Data Step 2: Click the DATA tab in the Ribbon
Figure 2.5 - Top Selling Automobiles Data Sorted by Sales in Step 3: Click Filter in the Sort & Filter group
March 2010 Sales Step 4: Click on the Filter Arrow in column B, next to
Manufacturer
The result of using Excel’s Sort function for the March 2010 Step 5: Select only the check box for Toyota. You can easily
data is shown in Figure 2.5. deselect all choices by unchecking (Select All)
Although the Honda Accord was the best-selling automobile
in March 2011, both the Toyota Camry and the Toyota
Corolla/Matrix outsold the Honda Accord in March 2010.
Note that while Sales (March 2010), which is in column E, is
sorted, the data in all other columns are adjusted
Step 2: Click on the HOME tab in the Ribbon
Step 3: Click Conditional Formatting in the Styles group
Figure 2.6 - Top Selling Automobiles Data Filtered to Show Only Step 4: Select Highlight Cells Rules, and click Less Than from the
Automobiles Manufactured by Toyota dropdown menu
Step 5: Enter 0% in the Format cells that are LESS THAN: box
The result (Figure 2.6) is a display of only the data for models Step 6: Click OK
made by Toyota. Figure 2.7 - Using Conditional Formatting in Excel to Highlight
Of the 20 top-selling models in March 2011, Toyota made Automobiles with Declining Sales from March 2010
three of them. Here, the models with decreasing sales (Toyota Camry, Ford
Further filter the data by choosing the down arrows in the Focus, Chevrolet Malibu, and Nissan Versa) are now clearly
other columns. visible.
All data can be made visible again by clicking on the down
arrow in column B and checking (Select All) or by clicking
Filter in the Sort & Filter Group again from the DATA tab.
Conditional Formatting of Data in Excel:
Makes it easy to identify data that satisfy certain
conditions in a data set.
Illustration - To identify the automobile models in Table 2.2 for
which sales had decreased from March 2010 to March 2011.
Step 1: Starting with the original data shown in Figure 2.3, select
cells F1:F21
In this case, we want to count the number of times Coca- Table 2.5 - Relative Frequency and Percent Frequency
Cola appears in the sample data. The result is a value of 19 in Distributions of Soft Drink Purchases
cell E2, indicating that Coca-Cola appears 19 times in the
sample data. Table 2.4 shows that the relative frequency for Coca-Cola is
19/50 = 0.38, the relative frequency for Diet Coke is 8/50 =
The formula from cell E2 to cells E3 to E6 can be copied to 0.16, and so on.
get frequency counts for Diet Coke, Pepsi, Dr. Pepper, and
Sprite. By using the absolute reference $A$2:$B$26 in the From the percent frequency distribution, it is seen that 38
formula. percent of the purchases were Coca-Cola, 16 percent of the
purchases were Diet Coke, and so on.
Relative frequency and percent frequency distributions
Relative frequency distribution Note that 38 percent + 26 percent + 16 percent = 80 percent
It is a tabular summary of data showing the relative of the purchases were the top three soft drinks.
frequency for each bin.
Frequency distributions for quantitative data
Percent frequency distribution Three steps necessary to define the classes for a frequency
Summarizes the percent frequency of the data for each bin. distribution with quantitative data:
1. Determine the number of nonoverlapping bins.
Used to provide estimates of the relative likelihoods of 2. Determine the width of each bin.
different values of a random variable. 3. Determine the bin limits.
Figure 2.10 shows the data from Table 2.6 entered into an
Excel Worksheet.
The sample of 20 audit times is contained in cells A2:D6.
The upper limits of the defined bins are in cells A10:A14.
Figure 2.13 - Completed Histogram for the Audit Time Data using
Data Analysis ToolPak in Excel
Panel D:
Highly skewed to the right
Example: Data on housing prices, salaries, purchase amounts,
and so on often result in histograms skewed to the right.
Histogram
provides information about the shape, or form, of a distribution.
Skewness
Lack of symmetry
Figure 2.14 - Histograms Showing Distributions with Different
Levels of Skewness Important characteristic of the shape of a distribution
Panel A: Cumulative Distributions
or equal to 24. Hence, the
Cumulative frequency distribution cumulative frequency for this class is 17.
A variation of the frequency distribution that provides In addition, the cumulative frequency distribution in Table
another tabular summary of quantitative data. 2.8 shows that four audits were completed in 14 days or less
and that 19 audits were completed in 29 days or less.
Uses the number of classes, class widths, and class limits The cumulative relative frequency distribution can be
developed for the frequency distribution. computed either by summing the relative frequencies in the
Shows the number of data items with values less than or relative frequency distribution or by dividing the cumulative
equal to the upper class limit of each class. frequencies by the total number of items.
Using the latter approach, we found the cumulative relative
frequencies in column 3 of Table 2.8 by dividing the
cumulative frequencies in column 2 by the total number of
items (n = 20).
The cumulative percent frequencies were again computed by
multiplying the relative frequencies by 100.
The cumulative relative and percent frequency distributions
show that 0.85 of the audits, or 85 percent, were completed
in 24 days or less, 0.95 of the audits, or 95 percent, were
completed in 29 days or less, and so on.
Table 2.8 - Cumulative Frequency, Cumulative Relative
Frequency, and Cumulative Percent Frequency Distributions for Measures of Location
the Audit Time Data
Mean/Arithmetic
Mean/Arithmetic
Consider the class mean
with the description “Less than or equal to Average value for a variable
24.” Average value for a variable. The mean is devoted by x
The cumulative frequency for this class is simply the sum of
The
the frequencies for all mean
classes is denoted
with data by or´𝑥 .
values less than
equal to 24.
The sum of the frequencies for classes 10–14, 15–19, and
20–24 indicates that 4 + 8 + 5 = 17 data values are less than n sample size
x1 value of x for the first observation
n = sample size
𝑥 = value of variable x for the first observation
1
𝑥
x2 value of x for the second observation
xn value of x for the nth observation
Median
Illustration Value in the middle when the data are arranged in
Computation of the mean home selling price for the sample ascending order.
of 12 home sales:
Middle value, for an odd number of observations
Illustration
When the number of observations are odd
Consider the class size data for a sample of five college classes:
46 54 42 46 32
Arrange the class size data in ascending order .
Table 2.9 - Data on Home Sales in Cincinnati, Ohio, Suburb 32 42 46 46 54
Middlemost value in the data set = 46.
Median is 46.
Consider the data on home sales in Cincinnati, Ohio, Suburb: Consider the class size data:
32 42 46 46 54
Multimodal data
Data contain at least two modes.
Bimodal data
Data contain exactly two modes.
Illustration (contd.)
When the number of observations are even
Arrange the
data in
ascending
order:
Excel enters the values for both modes of this data set in Solution:
cells E4 and E5: $138,000 and $254,000. Product of the growth factors:
Geometric mean
nth root of the product of n values
Geometric mean of the growth factors:
Used in analyzing growth rates in financial data
Sample geometric mean: Conclude that annual returns grew
at an average annual rate of (1.09-1) 100 % or 2.9%
Largest home sales price - $456,250
Smallest home sales price - $108,000
Range = Largest value – Smallest value
= $456,250 – $108,000
= $348,250
Drawback
Range is based on only two of the observations and thus is
Figure 2.16 - Calculating the Geometric Mean for the Mutual highly influenced by extreme values.
Fund Data Using Excel
Variance
In Figure 2.16, the value for the geometric mean in cell C13 is Measure of variability that utilizes all the data
found using the formula =GEOMEAN(C2:C11).
It is based on the deviation about the mean, which is the
Measures of Variability difference between the value of each observation (xi) and the
mean.
Range
Found by subtracting the smallest value from the largest The deviations about the mean are squared while computing
value in a data set. the variance.
Population variance,
Expressed as a percentage
Standard deviation
Positive square root of the variance
For sample,
For population,
Coefficient of variation