DS II Mid Term 2017 Solution
DS II Mid Term 2017 Solution
DS II Mid Term 2017 Solution
Decision Sciences II
Mid-Term Examination
Wednesday, October 25, 2017
Time : 180 minutes
Total No. of Pages : 18 Name ________________________
Total No. of Questions: 3 Roll No. ________________________
Total marks: Section ________________________
Instructions
1. This is a closed book exam. You are NOT allowed to use text book and class notes.
2. Answer all questions only in the space provided following the question.
3. Show all work and give adequate explanations to get full credit.
4. You may use the backside of the last page for rough work only if needed. Do NOT attach any rough
work/sheets.
5. Encircle or underline your final answer for each part.
6. No clarifications will be made during the exam.
7. Assume 95% confidence level if necessary ( = 0.05).
8. Use approximate critical values for Z, t, F, and 2 tests if the exact value is not available in the tables attached
with the question paper.
Question Q1 Q2 Q3
Number
Max Marks Total
Marks Scored
2
Per Capita Income of 20 countries were analysed using the variables described in Table 1.
A simple linear regression model was developed between Box office collection and budget. SPSS output of the
model is shown in Tables 2-3 and Figures 1-2.
Descriptive Statistics are shown in Table 2:
Table 3 Correlations
CI Gini CS PerCapita
CI 1 -.464* -.612** .862**
Gini -.464* 1 .253 -.338
CS -.612** .253 1 -.556*
Per Capita .862** -.338 -.556* 1
Model 1
Y (Per Capita) = 0 + 1 x CI
SPSS model outputs are shown in Tables 4 and 5. Normal P-P Plot and Residual Plot are shown in Figures 1
3
and 2.
Table 5 Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
(Constant) -3112.753 5950.818 -.523 .607
1
CI 662.914 91.706 .862
a. Dependent Variable: Per Capita
Portion of variation in per capita income explained by CI = R2 = (correlation of CI and per-capita)^2 = 0.862^2
= 0.743
So there is a statistically significant relationship between CI and per capita income at 5% significance
Ans: Hypothesis:
H0: U1 < 500
Ha: U1 >= 500
Using t-test:
̂ −𝜷
𝜷
t =𝑺𝒆(𝜷̂)
where
̂ = 662.914 and 𝜷 = 500
𝜷
Standard error of beta= 91.706
tval = 1.776
for n =18 and significance level 10%, t0.1 = 1.33
(PS: Use one tailed test as the hypothesis is U1 less than / greater than no)
So, per capita income increases by at least 500 dollars for every one unit increase in corruption index.
t0.025=2.101
6
A second model is developed between Per Capita and Communists States (CS).
Model 2
Y (Per Capita) = 0 + 1 x CS
Table 6 Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
(Constant) 42743.933 3495.319 12.229 .000
1
CS -19831.65 6990.639 -.556 -2.836
a. Dependent Variable: PerCapita
No model is not valid as the variation of residual is different for different values of CS as seen in scatter plot in
fig 4. There is heteroscedasticity.
Put the values from table 7 and check whether H0 is rejected or not.
VIF = 1/(1-R2)
= 1 /(1-0.215)
1.274
VIF <10
A stepwise regression model is developed using all the 3 independent variables and the SPSS outputs are given
in Tables 8 and 9.
9
Table 8 Coefficientsa
Model Unstandardized Coefficients Standardized t Sig.
Coefficients
B Std. Error Beta
(Constant) -3112.753 5950.818 -.523 .607
1
CI 662.914 91.706 .862 7.229 .000
a. Dependent Variable: PerCapita
Question 11 (2 points)
Based on the information provided in Tables 8 and 9, is it possible to conclude that there is no statistically
significant relationship between Per Capita and Gini and CS?
Gini and CS variables are excluded from the model as the p-value for these variables is large. So, it is possible
to conclude that there is no statistically significant relationship between Per Capita and Gini and CS when the
model includes CI.
However, independently there may be a statistically significant relationship between Per Capita and Gini and
CS.
Applicants who apply for a job at Precision Watches Inc., which requires extensive manual assembly of small
intricate parts, are initially given three different tests to measure their manual dexterity. The ones who are hired
are then periodically given a performance rating on a 0-100 scale that combines their speed and accuracy in
performing the required assembly operations. Data is collected on the test scores and performance ratings for a
randomly selected group of 80 employees who continued working for the company. Their seniority (months with
the company) at the time of the performance rating is also noted. The summary information and the results from
10
four regression models developed using the data are given below:
Descriptive Statistics
Minimu Maximu Std.
N m m Mean Deviation
JobPerf 80 38 100 65.75 10.630
Seniority 80 7 30 18.89 5.00
Test1 80 31 82 60.53 9.576
Test2 80 37 86 60.75 9.872
Test3 80 26 77 50.71 9.181
Valid N
80
(listwise)
Model 1 Summary
Std. Error
R Adjusted of the Durbin-
Model R Square R Square Estimate Watson
1 .176 9.651 1.856
a Predictors: (Constant), Seniority
b Dependent Variable: JobPerf
ANOVA
Sum of Mean
Model Squares df Square F Sig.
1 Regression 1662.584 1 1662.584 17.852 .000
Residual 7264.416 78 93.134
Total 8927.000 79
Coefficients
Standardize
Mode Unstandardized d
l Coefficients Coefficients t Sig.
Std.
B Error Beta
1 (Constant 48.928 4.125 11.861 .000
11
)
Seniority .891 .432
Model 2 Summary
Adjusted Std. Error
R R of the Durbin-
Model R Square Square Estimate Watson
1 .764 .583 .561 7.042 1.878
a Predictors: (Constant), Test3, Seniority, Test1, Test2
b Dependent Variable: JobPerf
ANOVA
Sum of Mean
Model Squares df Square F Sig.
1 Regression 5208.110 4 1302.027 26.258 .000
Residual 3718.890 75 49.585
Total 8927.000 79
Coefficients
Standardize
Mode Unstandardized d Collinearity
l Coefficients Coefficients t Sig. Statistics
Std. Toleranc
B Error Beta e VIF
1 (Constant) 6.557 6.187 1.060 .293
Seniority .801 .155 .388 5.171 .000 .986 1.014
Test1 .300 .112 .271 2.693 .009 .550 1.819
Test2 .086 .135 .080 .640 .524 .355 2.816
Test3 .407 .154 .352 2.638 .010 .313 3.197
Model 3 Summary
12
ANOVA
Sum of Mean
Model Squares df Square F Sig.
1 Regression 5187.803 3 1729.268 35.148 .000
Residual 3739.197 76 49.200
Total 8927.000 79
Coefficients
Standar
dized
Unstandardize Coeffici Collinearity
Model d Coefficients ents t Sig. Statistics
Std. Toleranc
B Error Beta e VIF
1 (Constant) 7.893 5.801 1.361 .178
Seniority .793 .154 5.157 .000 .993 1.008
Test1 .312 .110 2.844 .006 .565 1.771
Test3 .473 .114 4.145 .000 .567 1.764
Model 4 Summary
Adjusted Std. Error
R R of the Durbin-
Model R Square Square Estimate Watson
1 .757 .574 .562 7.031 1.843
a Predictors: (Constant), AvgScore, Seniority
b Dependent Variable: JobPerf
ANOVA
Sum of Mean
Model Squares df Square F Sig.
1 Regression 5120.011 2 2560.006 51.779 .000
Residual 3806.989 77 49.441
Total 8927.000 79
Coefficients
Unstandardized Standardize Collinearity
Model Coefficients d t Sig. Statistics
13
Coefficients
Std. Toleranc
B Error Beta e VIF
1 (Constant
5.407 6.010 .900 .371
)
Seniority .821 .154 .398 5.339 .000 .997 1.003
AvgScor
.782 .094 .623 8.362 .000 .997 1.003
e
Use the information given above to answer the following questions. Specify the model(s) you use to draw
your conclusions, where relevant.
a) Can it be concluded that performance rating improves with length of stay with the company (Seniority),
irrespective of the original test scores? Select the appropriate model to answer the question.
(3 points)
For model 1,
So we conclude that performance rating improves with length of stay with the company (Seniority), irrespective
of the original test scores
b) Predict the average performance rating for a worker who has 15 months of Seniority. What are the
highest and lowest performance ratings that this worker is likely to get? (3 Points)
From model 1
Rating = 48.928 + .891 * seniority
= 62.293
62.293 +- 1.89
= (60.40, 64.185)
14
c) If Test 2 was used to predict performance scores on its own, is it likely to be a significant predictor of
JobPerf? Justify. In the presence of other 3 variables is it a significant predictor. Why or not why not?
(2 Points)
Correlation between test 2 and Job Perf is 0.52 so it is likely to be significant predictor of Job Perf.
But in the presence of the other 3 variables it is not a significant predictor as seen in model 2, where it
has a high p value.
d) Can it be concluded that employees with higher average scores on the tests stay longer with the
company? Choose the appropriate models to compare. (3 points)
So we can conclude that employees with higher average scores on the tests stay longer with the company
e) Which factor has the largest impact on performance scores based on Model 3? Explain Clearly.
(2 Points)
Unstandardized Standardized
Coefficients Coefficients
B Std. Beta
Error
(Constant) 7.893 5.801
Seniority 0.793 0.154 0.072437
Test1 0.312 0.11 0.099093
Test3 0.473 0.114 0.09846
So based on standardised co-efficient, Test 1 has the largest impact on the scores
f) Two variables were added together to Model 1 to obtain Model 3. Have they contributed significantly
as a group in the prediction of Job Performance? (2 Points)
15
Take the values from model 1 and model 3 and compute the significance level. If H0 is rejected, then
they contributed significantly as a group in the prediction of Job Performance. Here, assume 95%
confidence level.
g) Two employees whose seniority differs by 5 months have the same average test score. Can it be
concluded that the performance rating of the more senior employee will be at least 3 points higher?
(3 Points)
Std. Toler
B Error Beta ance VIF
1 (Cons 5.
tant) 4 6.01 .9 .37
0 0 00 1
7
Senio .8 5.
.00 1.00
rity 2 .154 .398 33 .997
0 3
1 9
AvgS .7 8.
.00 1.00
core 8 .094 .623 36 .997
0 3
2 2
If (0.821-Z*.154)*5 > 3 then, the performance rating of the more senior employee will be at least 3 points
higher.
Question 3
A data analytics start up works with political parties during elections. They have got access to voting patterns
from various official sources. They are trying to understand how the percent of votes obtained by the winner is
determined. As a first cut they are using the following data:
16
College – 1 is for college educated winners and 0 for those who did not go to college.
They run the regression for all 543 elected MPs. The model output is provided below (with few missing
information):
Regression Statistics
Multiple R
R Square
Adjusted R
Square
Standard Error
Observations
ANOVA
Significance
df SS MS F F
Regression
Residual 17104.06
Total 542 36481.89
Standard P-
Coefficients Error t Stat value Lower 95% Upper 95%
Intercept 38.59235 0.937225 36.75129 40.4334106
MARGIN 5.32E-05 2.18E-06 4.89E-05 5.7463E-05
Gender 1.551306 0.777806 0.023404 3.07920835
College -1.47506 0.586995 -2.62814 -0.3219783
Regression Statistics
Multiple R 0.726
R Square 0.53
Adjusted R
Square 0.527
17
Standard P-
Coefficients Error t Stat value Lower 95% Upper 95%
Intercept 38.59235 0.937225 41.177252 36.75129 40.4334106
MARGIN 5.32E-05 2.18E-06 24.40367 4.89E-05 5.7463E-05
Gender 1.551306 0.777806 1.9944639 0.023404 3.07920835
-
College -1.47506 0.586995 2.5129005 -2.62814 -0.3219783
(iii) Assuming that t is significant for any value greater than 1.964 at 5%, are the variables significant?
Yes all variables are significant as t>1.964
(iv) Assuming that for any value of F greater than 2.621, is the overall regression significant?
Yes the overall regression is significant as F>2.621
The analytics firm decides to dig a little deeper and looks at two outlying states, UP and AP, one of which has
significantly lower assets per winner and the other significantly higher. Both the new variables are 0-1
variables.
Take the values from model 2 and model 5 and compute the significance level. If H0 is rejected, then they
contributed significantly as a group in the prediction of Job Performance. Here, assume 95% confidence
level.
Regression 5 has a standard error of 5.333135, an overall F value of 149.1324 with significance of 4.4x10-99 .
The standard deviation for the dependent variable is 8.204253. The values of standard deviation for the dependent
and independent variables are given below.
Standard
Coefficients deviation
Intercept 38.56993
MARGIN 5.58E-05 111365.7
Gender 1.498308 0.311494
College -1.53774 0.412796
UP -3.71439 0.354761
AP 5.715821 0.209766
UP -3.71439 0.355
(0.161)
AP 5.715821 0.21
0.146
Margin has the greatest impact
h) Can it be concluded that successive errors in model 4 are positively correlated? (2 points)
Successive error means that next residual is dependent on the previous residual. The residuals are not randomly
distributed
Model 5 adds a categorical variable AP to model 4.
Model 4 : Y = bo + margin * b1 + Gender * b2 + College *b3 + UP *b4
Model 5 : Y = bo’ + margin * b1’ + Gender * b2’ + College *b3’ + UP *b4’ + AP * b5’
Model 5 :
Y = bo’+ b5’ + margin * b1’ + Gender * b2’ + College *b3’ + UP *b4’ when AP =1
= bo’ + margin * b1’ + Gender * b2’ + College *b3’ + UP *b4’ when AP is 0
So the residual will not be random but there will be offset in their values based on AP is 1 or 0.
20