E-Book On Essentials of Business Analytics: Group 7
E-Book On Essentials of Business Analytics: Group 7
E-Book On Essentials of Business Analytics: Group 7
Group 7
Written Report
E-book on Essentials of
Business Analytics
What is data
– Data is a collection of information and statistics used for reference or analysis.
According to Kids Do Ecology, Data are the information gained from observing and
testing an experiment. Scientists use data to gain understanding and make conclusions.
Here is an example of data presented using a graph.
Data can be categorized in several ways based on how they are collected and
the type collected. In many cases, it is not feasible to collect data from the
population of all elements of interest. In such instances, we collect data from a
subset of the population known as a sample. For example, with the thousands
of publicly traded companies in the United States, tracking and analyzing all of
these stocks every day would be too time consuming and expensive. The Dow
represents a sample of 30 stocks of large public companies based in the
United States, and it is often interpreted to represent the larger population of
all publicly traded companies.
It is very important to collect sample data that are representative of the
population data so that generalizations can be made from them. In most cases
although not true of the Dow, a representative sample can be gathered by
random sampling of the population data. Dealing with populations and
samples can introduce subtle differences in how we calculate and interpret
summary statistics. In almost all practical applications of business analytics, we
will be dealing with sample data.
Quantitative and Categorical Data
Quantitative Data: If numeric and arithmetic operations such as addition,
subtraction, multiplication and division can be performed on them.
Categorical Data: If numeric and arithmetic operations cannot be performed.
It can be summarized categorical data by counting the number of observations
or computing the proportions of observations in each category.
a) Mean (Arithmetic Mean) - The most commonly used measure of location is the
mean (arithmetic mean), or average value, for a variable. The mean provides a
measure of central location for the data. If the data are for a sample (typically
the case), the mean is denoted by x. The sample mean is a point estimate of
the (typically unknown) population mean for the variable of interest. If the
data for the entire population are available, the population mean is computed
in the same manner, but denoted by the Greek letter m.
b) Median - The median, another measure of central location, is the value in the
middle when the data are arranged in ascending order (smallest to largest
value). With an odd number of observations, the median is the middle value.
An even number of observations has no single middle value. In this case, we
follow convention and define the median as the average of the values for the
middle two observations.
c) Mode - A third measure of location, the mode, is the value that occurs most
frequently in a data set. To illustrate the identification of the mode, consider
the sample of five class sizes.
d) Geometric Mean - The geometric mean is a measure of location that is
calculated by finding the nth root of the product of n values.
a) Range - The simplest measure of variability is the range. The range can be
found by subtracting the smallest value from the largest value in a data set
b) Variance - The variance is a measure of variability that utilizes all the data. The
variance is based on the deviation about the mean, which is the difference
between the value of each observation (xi) and the mean.
c) Standard Deviation - The standard deviation is defined to be the positive
square root of the variance. We use s to denote the sample standard deviation
and s to denote the population standard deviation.
d) Coefficient of Variation - in some situations we may be interested in a
descriptive statistic that indicates how large the standard deviation is relative
to the mean. This measure is called the coefficient of variation and is usually
expressed as a percentage.
4) Analyzing Distributions and its definitions - Distributions are very useful for
interpreting and analyzing data. A distribution describes the overall variability of
the observed values of a variable.
a) Scatter Charts - A scatter chart is a useful graph for analyzing the relationship
between two variables.
b) Covariance - Covariance is a descriptive measure of the linear association
between two variables.
c) Correlation Coefficient - The correlation coefficient measures the relationship
between two variables, and, unlike covariance, the relationship between two
variables is not affected by the units of measurement for x and y.