Ebook - Statistics Fundamentals For Business Analytics
Ebook - Statistics Fundamentals For Business Analytics
Ebook - Statistics Fundamentals For Business Analytics
Devesh Bathla
1. What is Statistics, its types and key characteristics?
Statistics is something, which can be used by anyone and in any career or field. It is about population
and sample, Quantitative and qualitative data and many more things.
Two types :
Characteristics of statistics
a) Primary data
Data which is collected for the first time and data is fresh. The data which is collected by user itself
directly from the people and respondents he/she need. Interviews, surveys and etc.
b) Secondary data
Data which is already been published or collected by someone else. The data can be related to your
topic but not exactly for your topic only.
Classification of data
5. Types of Sampling?
a) Probability Sampling - in which sample from a larger population are chosen using a method based
on the theory of probability. For a participant to be considered as a probability sample, he/she must be
selected using a randomselection.
b) Non – Probability Sampling - where the samples are gathered in a process that does not give all the
individuals in the population equal chances of being selected.
Probability Sampling
Simple Random sampling - In a simple random sample (SRS) of a given size, all such subsets of the
frame are given an equal probability.
Cluster sampling - In this the sample is divided in the equal cluster on the basis of demographic or
geographic parameters.
Systematic sampling - Systematic sampling (also known as interval sampling) relies on arranging the
study population according to some ordering scheme and then selecting elements at regular intervals
through that ordered list.
Stratified sampling- When the population embraces a number of distinct categories, the frame can be
organized by these categories into separate "strata." Each stratum is then sampled as an independent
sub-population, out of which individual elements can be randomly selected.
Convenience sampling – It is a non-probability sampling technique where samples are selected from
the population only because they are conveniently available to researcher.
Quota sampling - It is a sampling technique in which a researcher gather data by making groups of
respondents.
6. Data Presentation
Data can be presented in 2 forms tabular or graphical. Data presentation is very important in business
analysis as by viewing , studying and analyzing the data a business analysis going to predict the future
or give solutions to the problems.
Tabular
• Tabular data is a way to show data in table form.
• In this you get data in systematic form of rows and columns.
• It simplifies the complex data.
Graphical
• Graphical representation is a visual display of data and it is more effective way to represent data.
• Graphical representation makes data more understandable to others.
• There are many different types of graphical representation such as pie chart, line chart, bar
diagram, histogram, scatter plot and others.
Bar Graph
Histogram
Line Chart
Pie Chart
7. Measures of Central Value
b) Discrete series
c) Continuous series
Mean - average of the data, which is derives at by total number of values divided by number of values.
Range = L – S
Coefficient of range = L - S / L + S
Mean Deviation
MD =∑ f |D|
N
Standard Deviation
Correlation - shows us that if the two variable are related to each other or not and how much related to
each other.
Coefficient of Correlation
When r = +1 ,it means there is perfect positive relation between the variables.
Regression : It tells us about the relation between mean value of one variable and other variable
values.
It also tells us that how one factor affects other factors.
USES OF REGRESSION
• Predictive analysis
• operation efficiency
• supporting decisions
• correcting errors & new insights
The central limit theorem is a theorem which is used in statistics to make data normally distributed and
to do that sample size should be minimum 30 and as the sample size increase the data will also get
more normally distributed.
Bell curve - When the data will get normally distributed you can find it out by making a bell curve if
the data make a proper bell curve then it is normally distributed but sometimes bell curve can be
skewed also, it can be left side skewed or right side skewed.
Error-term
• It is also known as residuals.
• Error term means the differences between the values predicted and actual values.
Time series
It is also another statistics topic through which we can study the trend and can also predict future
sometimes.
P - Value
P-value is also very important in statistics it tells us that if data is statistically significant or not, if p -
value is less then 0.05 then it is said that data is statistically significant but the value is more than that
then data is statistically insignificant and it cannot be used further for any analysis.