Nothing Special   »   [go: up one dir, main page]

MEL761: Statistics For Decision Making: About The Course

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 65

MEL761: Statistics for Decision Making

About the course Introduction Need Descriptive and Inferential Statistics Examples Various Problem Areas

Web site for the course:


http://paniit.iitd.ac.in/~deshmukh/

Dr S G Deshmukh
Mechanical Department Indian Institute of Technology
1

Objectives of this course


Appreciate the role of statistics in various decision making situations Summarize data with frequency distributions and graphic presentation. Interpret descriptive statistics for central tendency, dispersion and location Define and interpret probability. Utilize discrete and continuous probability distributions to determine probabilities in various managerial applications. Apply the central limit theorem to determine probabilities of sample means and compute and interpret point and interval estimates. Conduct Hypothesis tests for means Utilize linear regression to estimate and predict variables. Understand basic concepts of design-of-experiment Understand importance of non-parametric tests

Course coverage
Introduction to statistics: definitions and terminology; data classification; data collection techniques, various scales for measurement and their relevance Descriptive statistics: frequency distributions; measures of central tendency, Variation, Probability: basic concepts; multiplication and addition rules, Bayes rule, Discrete probability distributions: basic concepts; Binomial , Poisson and other discrete distributions Continuous probability distributions :Exponential and other distributions: Normal probability distributions: introductory concepts; the standard normal Distribution; central limit theorem, applications of normal distributions, approximations to discrete probability distributions Correlation and Regression analysis: overview of correlation; linear regression Type I and Type II errors, Confidence intervals: confidence intervals for the mean (large samples and small samples) and for population proportions Analysis of Variance and Design of Experiments, Non-parametric tests Case studies and applications to managerial decision making
3

Evaluation scheme
Surprise Quizzes (n numbers) Minors(2) Major Lab work /assignments Mini-Project Statistics application review 5% 30 % 35% 15 % 10 % 5%

Learning Objectives
Define statistics Become aware of a wide range of applications of statistics in business for decision making Differentiate between descriptive and inferential statistics Formulate and test various sets of hypotheses Understand implications of design of experiments

Statistics..
Plays an important role in many facets of human endeavour Occurs remarkably frequently in our everyday lives Is often incorrectly thought of as just a collection of data, graphs and diagrams

Statistics in Business
Accounting auditing and cost estimation Economics regional, national, and international economic performance Finance investments and portfolio management Management human resources, compensation, and quality management Management Information Systems (ERP): performance of systems which gather, summarize, and disseminate information to various managerial levels Marketing market analysis and consumer research International Business market and demographic analysis
7

What is Statistics?
Science of gathering, analyzing, interpreting, and presenting data Branch of mathematics One page in Courses of study? Facts and figures Measurement taken on a sample Type of distribution being used to analyze data Statistics is the scientific method that enables us to make decisions as responsibly as possible. 8

Statistics
The science of data to answer research questions
Formulate a research question(s) (hypothesis) Collect data Analyze and summarize data Draw conclusions to answer research questions
Statistical Inference

In the presence of variation


9

Answers Questions from Everyday Life


Business: Will a new marketing strategy be profitable? Industry: Will a products life exceed the warranty period? Medicine: Will this years flu vaccine reduce the chance of flu? Education: Will technology improve learning? Government: Will a change in interest rates affect inflation?

10

Decision making process..


1. Collecting pertinent information that is as reliable as possible. 2. Selecting the parts of the available information that are most helpful to make rational decisions.

3. Making the actual decisions as sensibly as possible on the basis of the available evidence.
4. Perceiving the risks entailed in the particular decision made, and evaluating the corresponding risks of alternative actions.

11

Example
Polio Vaccine Results of the Experiment

Vaccine Group Non-vaccine Group

57 142

12

Can Statistics Be Trusted?


There are three kinds of lies: Lies, damned lies, and statistics.
--Mark Twain

It is easy to lie with statistics. But it is easier to lie without them.


--Frederick Mosteller

Figures wont lie but liars will figure.


--Charles Grosvenor
13

Can Statistics Be Trusted?


There are three kinds of lies: Lies, damned lies, and statistics.
--Mark Twain

It is easy to lie with statistics. But it is easier to lie without them.


--Frederick Mosteller

Figures wont lie but liars will figure.


--Charles Grosvenor
14

Population Versus Sample


Population the whole
a collection of persons, objects, or items under study The entire group of individuals in a statistical study we want information about.

Census gathering data from the entire population Sample a portion of the whole
a subset of the population a part of the population from which we actually collect information, used to draw conclusions about the whole (statistical inference
15

Statistics can be split into two broad categories


1. Descriptive statistics

2. Statistical inference

16

Descriptive vs. Inferential Statistics


Descriptive Statistics using data gathered on a group to describe or reach conclusions about that same group only
Inferential Statistics using sample data to reach conclusions about the population from which the sample was taken
17

Descriptive statistics..
Encompasses the following:
Graphical or pictorial display Condensation of large masses of data into a form such as tables Preparation of summary measures to give a concise description of complex information (e.g. an average figure) Exhibition of patterns that may be found in sets of information
18

Inferential Statistics..
Especially relates to:
Determining whether characteristics of a situation are unusual or if they have happened by chance Estimating values of numerical quantities and determining the reliability of those estimates Using past occurrences to attempt to predict the future

19

Statistics: Science of variability..?


Virtually everything varies Variation occurs among individuals Variation occurs within any one individual as time passes

20

Parameter vs. Statistic


Parameter descriptive measure of the population
Usually represented by Greek letters

Statistic descriptive measure of a sample


Usually represented by Roman letters
21

Symbols for Population Parameters


denotes population parameter

denotes population variance

denotes population standard deviation

22

Symbols for Sample Statistics


x denotes sample mean

denotes sample variance

S denotes sample standard deviation

23

Process of Inferential Statistics


Calculate x
Population

to estimate

Sample x (statistic)

(parameter )

Select a random sample


24

Levels of Data Measurement


Nominal Lowest level of measurement Ordinal Interval Ratio Highest level of measurement

25

Nominal Level Data


Numbers are used to classify or categorize
Example: Employment Classification
1 for Educator 2 for Construction Worker 3 for Manufacturing Worker

Example: Ethnicity
1 for African-American 2 for Anglo-American 3 for Hispanic-American
26

Ordinal Level Data


Numbers are used to indicate rank or order Relative magnitude of numbers is meaningful Differences between numbers are not comparable Example: Ranking productivity of employees Example: Taste test ranking of three brands of soft drink Example: Position within an organization 1 for President 2 for Vice President 3 for Plant Manager 4 for Department Supervisor 5 for Employee

27

Example of Ordinal Measurement

1 6 2 4 3 5

f i n i s h

28

Ordinal Data
Faculty should receive preferential treatment for parking space in new Bharati Telecom building.

Strongly Agree

Agree

Neutral

Disagree

Strongly Disagree

29

Interval Level Data


Distances between consecutive integers are equal
Relative magnitude of numbers is meaningful Differences between numbers are comparable Location of origin, zero, is arbitrary Vertical intercept of unit of measure transform function is not zero

Example: Fahrenheit Temperature Example: Calendar Time Example: Monetary Utility


30

Ratio Level Data


Highest level of measurement
Relative magnitude of numbers is meaningful Differences between numbers are comparable Location of origin, zero, is absolute (natural) Vertical intercept of unit of measure transform function is zero

Examples: Height, Weight, and Volume Example: Monetary Variables, such as Profit and Loss, Revenues, and Expenses Example: Financial ratios, such as P/E Ratio, Inventory Turnover

31

Usage Potential of Various Levels of Data


Ratio Interval Ordinal Nominal

32

Data Level, Operations, and Statistical Methods


Data Level Nominal Ordinal Interval Meaningful Operations Classifying and Counting All of the above plus Ranking All of the above plus Addition, Subtraction, Multiplication, and Division All of the above

Statistical Methods
Nonparametric Nonparametric Parametric

Ratio

Parametric

33

Visual presentation of data

34

Data preparation rules


Data presented must be
factual relevant

Before presentation always check: the source of the data that the data has been accurately transcribed the figures are relevant to the problem
35

Methods of visual presentation of data


Table
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 20.4 27.4 90 20.4 30.6 38.6 34.6 31.6 45.9 46.9 45 43.9

East West North

36

Methods of visual presentation of data


Graphs
90 80 70 60 50 40 30 20 10 0 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North

37

Methods of visual presentation of data


Pie chart

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

38

Methods of visual presentation of data


Multiple bar chart
4th Qtr 3rd Qtr 2nd Qtr 1st Qtr 0 20 40 60 80 100
39

North West East

Methods of visual presentation of data


Simple pictogram

100 80 60 40 20 0 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East
West

North

40

Frequency distributions
Frequency tables
Observation Table Frequency Cumulative Frequency 13 13 18 31 25 56 15 71 9 80

Class Interval < 20 <40 <60 <80 <100

41

Frequency diagrams
Frequency 30 25 20 15 10 5 0 < 20 <40 <60 <80 <100 Frequency

Cumulative Frequency 90 80 70 60 50 40 30 20 10 0 < 20


Frequency

Cumulative Frequency

<40

<60

<80

<100

30 25 20 15 10 5 0 < 20 <40 <60 <80 <100 Frequency

42

Ungrouped Versus Grouped Data


Ungrouped data
have not been summarized in any way are also called raw data

Grouped data
have been organized into a frequency distribution

43

Example of Ungrouped Data


42 30 53 26 58 40 32 37 30 34 50 47 57 30 49

50
52 30 55

40
28 36 30

32
23 32 58

31
35 26 64

40
25 50 52

Ages of a Sample of Managers from XYZ

49
61 74

33
31 37

43
30 29

46
40 43

32
60 54

44

Frequency Distribution of Ages


Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 Frequency 6 18 11 11 3 1
45

Data Range
42 30 53 50 52 30 55 49 61 74 26 58 40 40 28 36 30 33 31 37 32 37 30 32 23 32 58 43 30 29 34 50 47 31 35 26 64 46 40 43 57 30 49 40 25 50 52 32 60 54

Range = Largest - Smallest = 74 - 23 = 51

Smallest

Largest

46

Number of Classes and Class Width


The number of classes should be between 5 and 15. Fewer than 5 classes cause excessive summarization. More than 15 classes leave too much detail. Class Width Divide the range by the number of classes for an approximate class width Round up to a convenient number

51 Approximate Class Width = = 8.5 6 Class Width = 10

47

Class Midpoint
beginning class endpoint + ending class endpoint Class Midpoint = 2 30 + 40 = 2 = 35
1 Class Midpoint = class beginning point + class width 2 1 = 30 + 10 2 = 35
48

Relative Frequency
Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 Total Relative Frequency Frequency 6 .12 6 18 .36 50 11 .22 18 50 11 .22 3 .06 1 .02 50 1.00
49

Cumulative Frequency
Cumulative Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80 Total

Frequency 6 18 11 11 3 1 50

Frequency 6 18 + 6 24 11 + 24 35 46 49 50

50

Class Midpoints, Relative Frequencies, and Cumulative Frequencies


Relative
Cumulative Class IntervalFrequency Midpoint Frequency Frequency 20-under 30 6 25 .12 6 30-under 40 18 35 .36 24 40-under 50 11 45 .22 35 50-under 60 11 55 .22 46 60-under 70 3 65 .06 49 70-under 80 1 75 .02 50 51 Total 50 1.00

Cumulative Relative Frequencies


Cumulative
Relative Cumulative Relative Class IntervalFrequency Frequency Frequency Frequency 20-under 30 6 .12 6 .12 30-under 40 18 .36 24 .48 40-under 50 11 .22 35 .70 50-under 60 11 .22 46 .92 60-under 70 3 .06 49 .98 70-under 80 1 .02 50 1.00 Total 50 1.00 52

Common Statistical Graphs


Histogram -- vertical bar chart of frequencies Frequency Polygon -- line graph of frequencies Ogive -- line graph of cumulative frequencies Pie Chart -- proportional representation for categories of a whole Stem and Leaf Plot Pareto Chart Scatter Plot

53

Histogram
Class Interval Frequency 20-under 30 6 30-under 40 18 40-under 50 11 50-under 60 11 60-under 70 3 70-under 80 1
20 Frequency 0 10

10 20 30 40 50 60 70 80 Years
54

Histogram Construction
Class Interval Frequency 20-under 30 6 30-under 40 18 40-under 50 11 50-under 60 11 60-under 70 3 70-under 80 1

Frequency

10

20

10 20 30 40 50 60 70 80 Years
55

Frequency Polygon
Class Interval Frequency 20-under 30 6 30-under 40 18 40-under 50 11 50-under 60 11 60-under 70 3 70-under 80 1
20 Frequency 0 10

10 20 30 40 50 60 70 80 Years
56

Ogive
Cumulative Class Interval Frequency 20-under 30 6 30-under 40 24 40-under 50 35 50-under 60 46 60-under 70 49 70-under 80 50

Frequency

0
0

20

40

60

10

20

30

40 Years

50

60

70

80

57

Relative Frequency Ogive


Cumulative Relative Frequency .12 .48 .70 .92 .98 1.00

Cumulative Relative Frequency

Class Interval 20-under 30 30-under 40 40-under 50 50-under 60 60-under 70 70-under 80

1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 0 10 20 30 40 Years
58

50

60

70

80

Complaints by Passengers
COMPLAINT Stations, etc. Train Performance Equipment Personnel Schedules, etc. Total NUMBER 28,000 14,700 10,500 9,800 7,000 PROPORTION .40 .21 .15 .14 .10 DEGREES 144.0 75.6 50.4 50.6 36.0

70,000

1.00

360.0
59

Complaints by Passengers
Personnel 14% Equipment 15% Schedules, Etc. 10%

Train Performance 21%

Stations, Etc. 40%

60

Company A B

2d Quarter Truck Production 357,411 354,936 160,997

Second Quarter Truck Production

D
E Totals

34,099
12,747 920,190

61

Second Quarter Truck Production


17% 4% 1%

39% 39%

62

Pie Chart Calculations for Company A


Company A 2d Quarter Truck Production 357,411 Proportion .388 Degrees 140

B
C D E

357, 411 = 920,190

354,936
160,997 34,099 12,747 920,190

.386
.175

139
63 13 5 360
63

.388 360 = .037


.014 1.000

Totals

Pareto Chart
100 90 100% 90%

80
70

80%
70% 60%

Frequency

60

50
40 30 20 10 0 Poor Wiring Short in Coil Defective Plug Other

50%
40% 30% 20% 10% 0%

64

Scatter Plot
Registered Vehicles (1000's) Gasoline Sales (1000's of Gallons)
Gasoline Sales

200

5 15 9 15 7

60 120 90

100

140 60

10 15 Registered Vehicles

20

65