Nothing Special   »   [go: up one dir, main page]

Ebook - Statistics Fundamentals For Business Analytics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

e-Book on ‘Statistics Fundamentals for Business Analytics’ by Dr.

Devesh Bathla
1. What is Statistics, its types and key characteristics?
Statistics is something, which can be used by anyone and in any career or field. It is about population
and sample, Quantitative and qualitative data and many more things.

Two types :

a) Descriptive statistics (used to summarize and help to describe the data)


b) Inferential statistics (used for predicting and making decisions)

Characteristics of statistics

• It consists of aggregates of facts


• It is effected by many causes
• It should be numerically expressed
• It must be enumerated or estimated accurately
• It should be collected in a systematic manner
• It should be collected for a predetermined purpose
• It should be capable of being placed in relation to each other

2. Five Stages of Statistics

3. Types of Data and its general classification ?

a) Primary data
Data which is collected for the first time and data is fresh. The data which is collected by user itself
directly from the people and respondents he/she need. Interviews, surveys and etc.

b) Secondary data
Data which is already been published or collected by someone else. The data can be related to your
topic but not exactly for your topic only.

Classification of data

a) Geographical Data (area wise e.g.:- city, state)


b) Chronological Data (on the basis of time)
c) Qualitative Data (basis on some attributes or quality e.g.:- sex, color of hair, etc.)
d) Quantitative Data (according to some numerical characteristics e.g.:- height, weight, profits, sales)
4. Sampling - Way through which you can choose that how you are going to select the people or
respondents from whom you will collect data and why you choose this method to make sample and
collect data.

5. Types of Sampling?

a) Probability Sampling - in which sample from a larger population are chosen using a method based
on the theory of probability. For a participant to be considered as a probability sample, he/she must be
selected using a randomselection.

b) Non – Probability Sampling - where the samples are gathered in a process that does not give all the
individuals in the population equal chances of being selected.

Probability Sampling

Simple Random sampling - In a simple random sample (SRS) of a given size, all such subsets of the
frame are given an equal probability.

Cluster sampling - In this the sample is divided in the equal cluster on the basis of demographic or
geographic parameters.
Systematic sampling - Systematic sampling (also known as interval sampling) relies on arranging the
study population according to some ordering scheme and then selecting elements at regular intervals
through that ordered list.

Stratified sampling- When the population embraces a number of distinct categories, the frame can be
organized by these categories into separate "strata." Each stratum is then sampled as an independent
sub-population, out of which individual elements can be randomly selected.

Non – Probability Sampling

Convenience sampling – It is a non-probability sampling technique where samples are selected from
the population only because they are conveniently available to researcher.

Judgmental and purposive sampling


In judgmental sampling, the samples are selected based purely on researcher’s knowledge.
Researchers choose only those who he feels are a right to participate in research study.
Snowball sampling – It helps researchers find sample when they are difficult to locate. Researchers
use this technique when the sample size is small and not easily available. Once the researchers find
suitable subjects, they are asked for assistance to seek similar subjects to form a considerably good size
sample.

Quota sampling - It is a sampling technique in which a researcher gather data by making groups of
respondents.

6. Data Presentation

Data can be presented in 2 forms tabular or graphical. Data presentation is very important in business
analysis as by viewing , studying and analyzing the data a business analysis going to predict the future
or give solutions to the problems.
Tabular
• Tabular data is a way to show data in table form.
• In this you get data in systematic form of rows and columns.
• It simplifies the complex data.


Graphical
• Graphical representation is a visual display of data and it is more effective way to represent data.
• Graphical representation makes data more understandable to others.
• There are many different types of graphical representation such as pie chart, line chart, bar
diagram, histogram, scatter plot and others.
Bar Graph

Histogram

Line Chart

Pie Chart
7. Measures of Central Value

Different types of series


a) Individual series

b) Discrete series



c) Continuous series

Mean - average of the data, which is derives at by total number of values divided by number of values.

Median - mid term of data or mid – value.


• Median = N +1 / 2 or N / 2
• Median = L +( N / 2 - cf ) x I
f

Mode - most frequently occurring number found in a set of numbers.


8. Measures of Dispersion

Range = L – S
Coefficient of range = L - S / L + S

Mean Deviation
MD =∑ f |D|
N

Standard Deviation

9. What is correlation and what are its types ?

Correlation - shows us that if the two variable are related to each other or not and how much related to
each other.

There are 3 different types of correlation: -


Positive and Negative
• Positive correlation is that there is a relationship in both the variable they are varying in the
same direction.
• Negative correlation is that there is no relationship in both the variable they are not varying in
the same direction. If one is increasing other one is decreasing.

Simple, Partial and Multiple


• Simple correlation means the study of two variables only.
• Partial correlation there are more than two variables but we choose only two variables and
study them.
• Multiple correlation means the study of more than two or three variables.

• Linear and Non – Linear

Coefficient of Correlation
When r = +1 ,it means there is perfect positive relation between the variables.

When r = -1 , it means there is perfect negative relation between the variables.

When r = 0 , it means there is no relationship between the variables.


10. What is regression and its uses ?

Regression : It tells us about the relation between mean value of one variable and other variable
values.
It also tells us that how one factor affects other factors.

USES OF REGRESSION

• Predictive analysis
• operation efficiency
• supporting decisions
• correcting errors & new insights

11. Central limit theorem

The central limit theorem is a theorem which is used in statistics to make data normally distributed and
to do that sample size should be minimum 30 and as the sample size increase the data will also get
more normally distributed.

Bell curve - When the data will get normally distributed you can find it out by making a bell curve if
the data make a proper bell curve then it is normally distributed but sometimes bell curve can be
skewed also, it can be left side skewed or right side skewed.

12. Other terms

Error-term
• It is also known as residuals.
• Error term means the differences between the values predicted and actual values.

Time series
It is also another statistics topic through which we can study the trend and can also predict future
sometimes.

P - Value
P-value is also very important in statistics it tells us that if data is statistically significant or not, if p -
value is less then 0.05 then it is said that data is statistically significant but the value is more than that
then data is statistically insignificant and it cannot be used further for any analysis.

You might also like