DS II Mid Term 2017 Solution

1
Decision Sciences II
Mid-Term Examination
Wednesday, October 25, 2017
Time : 180 minutes
Total No. of Pages : 18 Name ________________________
Total No. of Questions: 3 Roll No. ________________________
Total marks: Section ________________________
Instructions
1. This is a closed book exam. You are NOT allowed to use text book and class notes.
2. Answer all questions only in the space provided following the question.
3. Show all work and give adequate explanations to get full credit.
4. You may use the backside of the last page for rough work only if needed. Do NOT attach any rough
work/sheets.
5. Encircle or underline your final answer for each part.
6. No clarifications will be made during the exam.
7. Assume 95% confidence level if necessary ( = 0.05).
8. Use approximate critical values for Z, t, F, and 2 tests if the exact value is not available in the tables attached
with the question paper.
Question Q1 Q2 Q3
Number
Max Marks Total
Marks Scored
2
Question 1 (20 points)
Per Capita Income of 20 countries were analysed using the variables described in Table 1.
Table 1. Data Dictionary

S.No Variable Variable Type Code in SPSS output
1 Per Capita Income Numerical (in Per Capita
Dollars)
2 Corruption Index (Higher Integer CI
Value indicates lower level
of corruption in the
country)
3 Gini Index (Measure of Numerical Gini
Wealth Distribution and
Discrimination)
4 Communist State Binary CS
(Whether the county 1 = Communist State; 0 otherwise
was/is a communist state)
A simple linear regression model was developed between Box office collection and budget. SPSS output of the
model is shown in Tables 2-3 and Figures 1-2.
Descriptive Statistics are shown in Table 2:
Table 2 Descriptive Statistics

Std.
N Minimum Maximum Mean
Deviation
CI 20 29.0 90.0 61.700 20.6171
Gini 20 23.5 53.7 34.740 7.3846
CS 20 .0 1.0 .250 .4443
Per Capita 20 12275.0 69249.0 37789.050 15847.4829
Valid N
20
(listwise)
Table 3 Correlations
CI Gini CS PerCapita
CI 1 -.464* -.612** .862**
Gini -.464* 1 .253 -.338
CS -.612** .253 1 -.556*
Per Capita .862** -.338 -.556* 1
Model 1
Y (Per Capita) = 0 + 1 x CI
SPSS model outputs are shown in Tables 4 and 5. Normal P-P Plot and Residual Plot are shown in Figures 1
3
and 2.
Table 4 Model Summaryb

Model R R Square Adjusted R Std. Error of the
Square Estimate
1 8241.4390
a. Predictors: (Constant), CI
b. Dependent Variable: Per Capita
Table 5 Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
(Constant) -3112.753 5950.818 -.523 .607
1
CI 662.914 91.706 .862
a. Dependent Variable: Per Capita
Figure 1 Normal P-P Plot of standardized residuals

4
Figure 2. Residual Plot
Question 1.1 (1 points)

What proportion of the variation in per capita income is explained by corruption index (CI)?
Portion of variation in per capita income explained by CI = R2 = (correlation of CI and per-capita)^2 = 0.862^2
= 0.743
Question 1.2 (1 point)

Is there a statistically significant relationship between corruption index and per capita income of the countries
at 5% significance?
t = b1 / sb1 = 662.914 / 91.706 = 7.29
From t distribution for df = 20-2 = 18, t0.025=2.10
α = 0.05, α/2 = 0.025

t >t 0.025
So there is a statistically significant relationship between CI and per capita income at 5% significance

Is it possible to conclude that the per capita income increases by at least 500 dollars for every one unit increase
in corruption index at 10% significance level? Clearly write all the steps.
5
Ans: Hypothesis:
H0: U1 < 500
Ha: U1 >= 500
Using t-test:
̂ −𝜷
𝜷
t =𝑺𝒆(𝜷̂)
where
̂ = 662.914 and 𝜷 = 500
𝜷
Standard error of beta= 91.706
tval = 1.776
for n =18 and significance level 10%, t0.1 = 1.33
(PS: Use one tailed test as the hypothesis is U1 less than / greater than no)
As tval >tcritical - reject null hypothesis
So, per capita income increases by at least 500 dollars for every one unit increase in corruption index.
Question 1.4 (1 Point)

What can you conclude about the model (model 1) based on the plots in Figures 1 and 2?
Fig 1 and 2 shows that the residual distribution follows a normal distribution
So our assumptions for error term are valid-

 E(e) = 0
 Variance of e is same for all values of x
 Values of e are independent
 E has a normal distribution
So our regression model is valid
Question 1.5 (3 Points)

What is the minimum average value of per capita at 95% confidence interval when CI = 50?
Per capita income = -3112.753 + 662.914 * CI = -3112.753 + 662.914*50 = 30032.947
A (1 -  )100% confidence
interval for E(Y X) :

 1 (X i  X )2
Yi  t / 2 S e 
n SS X

Here Y is the E(Y X).
Df = 18, α = 0.05, α/2 = 0.025
t0.025=2.101
6
min value of per capita income

= 30032.947 – 2.101 * 8241.43 * (1/20 + (50-61.700)^2 / 20 * 20.6171^2)
= 28888.37
A second model is developed between Per Capita and Communists States (CS).
Model 2
Y (Per Capita) = 0 + 1 x CS
The output is shown in Tables 6.
Coefficients
B Std. Error Beta
(Constant) 42743.933 3495.319 12.229 .000
1
CS -19831.65 6990.639 -.556 -2.836
a. Dependent Variable: PerCapita
Figure 3 Normal Probability Plot

7
Figure 4 Residual Plot

Calculate the average per capita of communist states. Clearly write all the steps?
Standardised b = b1 *scs /sy
b1 = -0.556 *15847.4829 / .4443

= -19831.65
Avg per capita = b0+b1

Is model 2 statistically significant, use all the information (Table 6, Figures 3 and 4) provided. Clearly write all
the arguments.
No model is not valid as the variation of residual is different for different values of CS as seen in scatter plot in
fig 4. There is heteroscedasticity.
A stepwise regression model is developed using outputs are shown in Tables 7.
Model Unstandardized Coefficients t Sig. Correlations

B Std. Error Zero-order Partial Part
(Constant) -3112.753 5950.818 -.523 .607
1
CI 662.914 91.706 7.229 .000 .862 .862 .862
(Constant) -10781.284 14572.250 -.740 .469
2 CI 691.235 105.487 6.553 .000 .862 .846 .797
Gini -.338 .139 .070
8

What is the value of R-square after adding the variable Gini to the model?
R2 = (correlation of CI and.per Capita)^2 + (part correlation of Gini)^2 = .862^2+ .07^2 = 0.747

Carry out an appropriate hypothesis test to check whether the variable “Gini” is worth adding to the model at
10% significance.
Model 1 R2 = 0.862^2 = 0.743 = Rreduced

Model 2 R2 = 0.747 = Rfull
Perform partial F test between model 2 and 3
Hypothesis: H0: BetaGini = 0
Put the values from table 7 and check whether H0 is rejected or not.
If H0 is rejected then it is worth adding Gini in the model at 10% significance.

Calculate the variance inflation factor between variables CI and Gini. What can you conclude from the
calculated VIF value?
For Gini as independent variable,
R2 = (correlation between CI and gini)^2

= 0.464^2
=0.215
VIF = 1/(1-R2)
= 1 /(1-0.215)
1.274
VIF <10
So there is no multicollinearity between CI and Gini
A stepwise regression model is developed using all the 3 independent variables and the SPSS outputs are given
in Tables 8 and 9.
9
Coefficients
B Std. Error Beta
(Constant) -3112.753 5950.818 -.523 .607
1
CI 662.914 91.706 .862 7.229 .000
Table 9 Excluded Variablesa

Model Beta In t Sig. Partial Collinearity
Correlation Statistics
Tolerance
Gini .079b .579 .570 .139 .785
1 b
CS -.044 -.287 .777 -.070 .625
b. Predictors in the Model: (Constant), CI
Based on the information provided in Tables 8 and 9, is it possible to conclude that there is no statistically
significant relationship between Per Capita and Gini and CS?
Gini and CS variables are excluded from the model as the p-value for these variables is large. So, it is possible
to conclude that there is no statistically significant relationship between Per Capita and Gini and CS when the
model includes CI.
However, independently there may be a statistically significant relationship between Per Capita and Gini and
CS.
Applicants who apply for a job at Precision Watches Inc., which requires extensive manual assembly of small
intricate parts, are initially given three different tests to measure their manual dexterity. The ones who are hired
are then periodically given a performance rating on a 0-100 scale that combines their speed and accuracy in
performing the required assembly operations. Data is collected on the test scores and performance ratings for a
randomly selected group of 80 employees who continued working for the company. Their seniority (months with
the company) at the time of the performance rating is also noted. The summary information and the results from
10
four regression models developed using the data are given below:
Pairwise Correlation Matrix

JobPerf Seniority Test1 Test2 Test3
JobPerf 1
Seniority 0.43 1.00
Test1 0.58 1.00
Test2 0.52 0.60 1.00
Test3 0.62 0.66 0.80 1.00
Descriptive Statistics
Minimu Maximu Std.
N m m Mean Deviation
JobPerf 80 38 100 65.75 10.630
Seniority 80 7 30 18.89 5.00
Test1 80 31 82 60.53 9.576
Test2 80 37 86 60.75 9.872
Test3 80 26 77 50.71 9.181
Valid N
80
(listwise)
Model 1 Summary
Std. Error
R Adjusted of the Durbin-
Model R Square R Square Estimate Watson
1 .176 9.651 1.856
a Predictors: (Constant), Seniority
b Dependent Variable: JobPerf
ANOVA
Sum of Mean
Model Squares df Square F Sig.
1 Regression 1662.584 1 1662.584 17.852 .000
Residual 7264.416 78 93.134
Total 8927.000 79
Coefficients
Standardize
Mode Unstandardized d
l Coefficients Coefficients t Sig.
Std.
B Error Beta
1 (Constant 48.928 4.125 11.861 .000
11
)
Seniority .891 .432
Model 2 Summary
Adjusted Std. Error
R R of the Durbin-
Model R Square Square Estimate Watson
1 .764 .583 .561 7.042 1.878
a Predictors: (Constant), Test3, Seniority, Test1, Test2
ANOVA
Sum of Mean
1 Regression 5208.110 4 1302.027 26.258 .000
Residual 3718.890 75 49.585
Total 8927.000 79
Coefficients
Standardize
Mode Unstandardized d Collinearity
l Coefficients Coefficients t Sig. Statistics
Std. Toleranc
B Error Beta e VIF
1 (Constant) 6.557 6.187 1.060 .293
Seniority .801 .155 .388 5.171 .000 .986 1.014
Test1 .300 .112 .271 2.693 .009 .550 1.819
Test2 .086 .135 .080 .640 .524 .355 2.816
Test3 .407 .154 .352 2.638 .010 .313 3.197
Model 3 Summary
12
Adjusted Std. Error

R R of the Durbin-
1 .762 .581 .565 7.014 1.891
a Predictors: (Constant), Test3, Seniority, Test1
ANOVA
Sum of Mean
1 Regression 5187.803 3 1729.268 35.148 .000
Residual 3739.197 76 49.200
Total 8927.000 79
Coefficients
Standar
dized
Unstandardize Coeffici Collinearity
Model d Coefficients ents t Sig. Statistics
Std. Toleranc
B Error Beta e VIF
1 (Constant) 7.893 5.801 1.361 .178
Seniority .793 .154 5.157 .000 .993 1.008
Test1 .312 .110 2.844 .006 .565 1.771
Test3 .473 .114 4.145 .000 .567 1.764
Model 4 Summary
Adjusted Std. Error
R R of the Durbin-
1 .757 .574 .562 7.031 1.843
a Predictors: (Constant), AvgScore, Seniority
ANOVA
Sum of Mean
1 Regression 5120.011 2 2560.006 51.779 .000
Residual 3806.989 77 49.441
Total 8927.000 79
Coefficients
Unstandardized Standardize Collinearity
Model Coefficients d t Sig. Statistics
13
Coefficients
Std. Toleranc
B Error Beta e VIF
1 (Constant
5.407 6.010 .900 .371
)
Seniority .821 .154 .398 5.339 .000 .997 1.003
AvgScor
.782 .094 .623 8.362 .000 .997 1.003
e
Use the information given above to answer the following questions. Specify the model(s) you use to draw
your conclusions, where relevant.
a) Can it be concluded that performance rating improves with length of stay with the company (Seniority),
irrespective of the original test scores? Select the appropriate model to answer the question.
(3 points)
For model 1,
Rating = 48.928 + .891 * seniority
So we conclude that performance rating improves with length of stay with the company (Seniority), irrespective
of the original test scores
b) Predict the average performance rating for a worker who has 15 months of Seniority. What are the
highest and lowest performance ratings that this worker is likely to get? (3 Points)
From model 1
Rating = 48.928 + .891 * seniority
= 62.293
Considering α = 0.05, α/2 = 0.025, df = 78

t0.025=1.9
A (1 -  )100% confidence
interval for E(Y X) :

 1 (X i  X )2
Yi  t / 2 S e 
n SS X

Here Y is the E(Y X).
Y = 62.293 +- 1.9 * 7.031 * sqroot(1/80 + (15 – 18.89)^2 / (80 * 5^2) )
62.293 +- 1.89
= (60.40, 64.185)
14
c) If Test 2 was used to predict performance scores on its own, is it likely to be a significant predictor of
JobPerf? Justify. In the presence of other 3 variables is it a significant predictor. Why or not why not?
(2 Points)
Correlation between test 2 and Job Perf is 0.52 so it is likely to be significant predictor of Job Perf.
But in the presence of the other 3 variables it is not a significant predictor as seen in model 2, where it
has a high p value.
d) Can it be concluded that employees with higher average scores on the tests stay longer with the
company? Choose the appropriate models to compare. (3 points)
MODEL 4: Rating = 5.407 + Seniority * .821 + AvgScore * .782
MODEL 1: Rating = 48.928 + .891 * seniority
Equating the two
5.407 + Seniority * .821 + AvgScore * .782 = 48.928 + .891 * seniority
Seniority = 0.011 * Avg Score - 621.73
So we can conclude that employees with higher average scores on the tests stay longer with the company
e) Which factor has the largest impact on performance scores based on Model 3? Explain Clearly.
(2 Points)
Unstandardized Standardized
Coefficients Coefficients
B Std. Beta
Error
(Constant) 7.893 5.801
Seniority 0.793 0.154 0.072437
Test1 0.312 0.11 0.099093
Test3 0.473 0.114 0.09846
So based on standardised co-efficient, Test 1 has the largest impact on the scores
f) Two variables were added together to Model 1 to obtain Model 3. Have they contributed significantly
as a group in the prediction of Job Performance? (2 Points)
15
Model 1 R2 = 0.43^2 = 0.1849

Model 3 R2 = 0.581
Conduct partial F test by using
Take the values from model 1 and model 3 and compute the significance level. If H0 is rejected, then
they contributed significantly as a group in the prediction of Job Performance. Here, assume 95%
confidence level.
g) Two employees whose seniority differs by 5 months have the same average test score. Can it be
concluded that the performance rating of the more senior employee will be at least 3 points higher?
(3 Points)
Std. Toler
B Error Beta ance VIF
1 (Cons 5.
tant) 4 6.01 .9 .37
0 0 00 1
7
Senio .8 5.
.00 1.00
rity 2 .154 .398 33 .997
0 3
1 9
AvgS .7 8.
.00 1.00
core 8 .094 .623 36 .997
0 3
2 2
Rating= 5.407 + 0.821* Seniority + 0.782 * AvgScore
Range of beta value (0.821+Z*.154, 0.821-Z*.154)
Z at 95% confidence interval
If (0.821-Z*.154)*5 > 3 then, the performance rating of the more senior employee will be at least 3 points
higher.
Question 3
A data analytics start up works with political parties during elections. They have got access to voting patterns
from various official sources. They are trying to understand how the percent of votes obtained by the winner is
determined. As a first cut they are using the following data:
16
% VOTES – the percent of votes polled obtained by the winning candidate
MARGIN – the margin of victory measured in number of votes
Gender – 1 is for Men and 0 for women
College – 1 is for college educated winners and 0 for those who did not go to college.
They run the regression for all 543 elected MPs. The model output is provided below (with few missing
information):
Regression Statistics
Multiple R
R Square
Adjusted R
Square
Standard Error
Observations
ANOVA
Significance
df SS MS F F
Regression
Residual 17104.06
Total 542 36481.89
Standard P-
Coefficients Error t Stat value Lower 95% Upper 95%
Intercept 38.59235 0.937225 36.75129 40.4334106
MARGIN 5.32E-05 2.18E-06 4.89E-05 5.7463E-05
Gender 1.551306 0.777806 0.023404 3.07920835
College -1.47506 0.586995 -2.62814 -0.3219783
(i) Fill up the Tables above
Regression Statistics
Multiple R 0.726
R Square 0.53
Adjusted R
Square 0.527
17
Standard Error 5.63

Observations 543
(ii)
ANOVA
Significance
df SS MS F F
Regression 3 19377.83 6459.28 203.55 2.7
Residual 539 17104.06 31.73
Total 542 36481.89
Standard P-
Coefficients Error t Stat value Lower 95% Upper 95%
Intercept 38.59235 0.937225 41.177252 36.75129 40.4334106
MARGIN 5.32E-05 2.18E-06 24.40367 4.89E-05 5.7463E-05
Gender 1.551306 0.777806 1.9944639 0.023404 3.07920835
-
College -1.47506 0.586995 2.5129005 -2.62814 -0.3219783
(iii) Assuming that t is significant for any value greater than 1.964 at 5%, are the variables significant?
Yes all variables are significant as t>1.964
(iv) Assuming that for any value of F greater than 2.621, is the overall regression significant?
Yes the overall regression is significant as F>2.621
The analytics firm decides to dig a little deeper and looks at two outlying states, UP and AP, one of which has
significantly lower assets per winner and the other significantly higher. Both the new variables are 0-1
variables.
The values for some of the regressions are given below.

Model Independent Variables R2
1 MARGIN
2 MARGIN, Gender 0.52567
3 MARGIN, Gender, College 0.531163
MARGIN, Gender, College,
4 UP 0.56051
5 UP, AP 0.581339
(v) What is the part correlation for College in regression 3?

18
Part correlation for college =

=sqroot (R2 of model 3 – R2 of model 2)
=sqroot (0.531163 – 0.52567)
=sqroot (0.00549)
=0.074
(vi) Between regression 2 and 5 is it justified to add the additional variables?
Conduct partial F test by using
Take the values from model 2 and model 5 and compute the significance level. If H0 is rejected, then they
contributed significantly as a group in the prediction of Job Performance. Here, assume 95% confidence
level.
Regression 5 has a standard error of 5.333135, an overall F value of 149.1324 with significance of 4.4x10-99 .
The standard deviation for the dependent variable is 8.204253. The values of standard deviation for the dependent
and independent variables are given below.
Standard
Coefficients deviation
Intercept 38.56993
MARGIN 5.58E-05 111365.7
Gender 1.498308 0.311494
College -1.53774 0.412796
UP -3.71439 0.354761
AP 5.715821 0.209766
(vii) Which variable has the greatest impact on Voting % ?

Standard Standardised
Coefficients
deviation Beta
Intercept 38.56993
MARGIN 5.58E-05 1E+05
0.757
Gender 1.498308 0.311
0.057
College -1.53774 0.413
(0.077)
19
UP -3.71439 0.355
(0.161)
AP 5.715821 0.21
0.146
Margin has the greatest impact
h) Can it be concluded that successive errors in model 4 are positively correlated? (2 points)
Model Independent Variables R2

1 MARGIN
2 MARGIN, Gender 0.52567
3 MARGIN, Gender, College 0.531163
4 UP 0.56051
5 UP, AP 0.581339
Successive error means that next residual is dependent on the previous residual. The residuals are not randomly
distributed
Model 5 adds a categorical variable AP to model 4.
Model 4 : Y = bo + margin * b1 + Gender * b2 + College *b3 + UP *b4
Model 5 : Y = bo’ + margin * b1’ + Gender * b2’ + College *b3’ + UP *b4’ + AP * b5’
But AP is categorical variable so
Model 5 :
Y = bo’+ b5’ + margin * b1’ + Gender * b2’ + College *b3’ + UP *b4’ when AP =1
= bo’ + margin * b1’ + Gender * b2’ + College *b3’ + UP *b4’ when AP is 0
Comparing model 4 and 5 we get

B1 = b1’
B2 =b2’
B3=b3’
B4=b4’
B0 = wt. avg of (bo’ and bo’ + b5’)
So the residual will not be random but there will be offset in their values based on AP is 1 or 0.
20
Therefore there is a Successive error in the model

DS II Mid Term 2017 Solution

Uploaded by

Copyright:

Available Formats

DS II Mid Term 2017 Solution

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DS II Mid Term 2017 Solution

Uploaded by

Copyright:

Available Formats

1

Question 1 (20 points)

Table 1. Data Dictionary

Table 2 Descriptive Statistics

Table 4 Model Summaryb

Figure 1 Normal P-P Plot of standardized residuals

Figure 2. Residual Plot

Question 1.1 (1 points)

Question 1.2 (1 point)

t = b1 / sb1 = 662.914 / 91.706 = 7.29

From t distribution for df = 20-2 = 18, t0.025=2.10

α = 0.05, α/2 = 0.025

Question 1.3 (2 points)

As tval >tcritical - reject null hypothesis

Question 1.4 (1 Point)

So our assumptions for error term are valid-

So our regression model is valid

Question 1.5 (3 Points)

Df = 18, α = 0.05, α/2 = 0.025

min value of per capita income

The output is shown in Tables 6.

Figure 3 Normal Probability Plot

Figure 4 Residual Plot

Question 1.6 (2 Points)

Standardised b = b1 *scs /sy

b1 = -0.556 *15847.4829 / .4443

Avg per capita = b0+b1

Question 1.7 (2 points)

A stepwise regression model is developed using outputs are shown in Tables 7.

Model Unstandardized Coefficients t Sig. Correlations

Question 1.8 (2 Points)

R2 = (correlation of CI and.per Capita)^2 + (part correlation of Gini)^2 = .862^2+ .07^2 = 0.747

Question 1.9 (2 points)

Model 1 R2 = 0.862^2 = 0.743 = Rreduced

Hypothesis: H0: BetaGini = 0

If H0 is rejected then it is worth adding Gini in the model at 10% significance.

Question 1.10 (2 points)

For Gini as independent variable,

R2 = (correlation between CI and gini)^2

So there is no multicollinearity between CI and Gini

Table 9 Excluded Variablesa

Question 2 (20 points)

Pairwise Correlation Matrix

Adjusted Std. Error

Rating = 48.928 + .891 * seniority

Considering α = 0.05, α/2 = 0.025, df = 78

Y = 62.293 +- 1.9 * 7.031 * sqroot(1/80 + (15 – 18.89)^2 / (80 * 5^2) )

MODEL 4: Rating = 5.407 + Seniority * .821 + AvgScore * .782

MODEL 1: Rating = 48.928 + .891 * seniority

Equating the two

5.407 + Seniority * .821 + AvgScore * .782 = 48.928 + .891 * seniority

Seniority = 0.011 * Avg Score - 621.73

Model 1 R2 = 0.43^2 = 0.1849

Conduct partial F test by using

Rating= 5.407 + 0.821* Seniority + 0.782 * AvgScore

Range of beta value (0.821+Z*.154, 0.821-Z*.154)

Z at 95% confidence interval

% VOTES – the percent of votes polled obtained by the winning candidate

Range of beta value (0.821+Z.154, 0.821-Z.154)