Business Analytics Assingment: Neha Singh
Business Analytics Assingment: Neha Singh
Business Analytics Assingment: Neha Singh
ASSINGMENT
NEHA SINGH
PGDM Batch – 2019-21
Roll No. & Section : 10 ‘A’
1
Ohio Education Performance Results Year 2000
2
Felicity Franklin 52 52 64 64 81 28
St. Bernard 40 81 48 59 41 26
Deer Park 40 69 52 66 43 25
Cincinnati Public 35 63 44 59 50 23
State Averages 60 79 68 79 72 46
Question
The State of Ohio Department of Education has a mandated ninth-grade proficiency test that
covers writing, reading, mathematics, citizenship (social studies), and science. The Excel file Ohio
Education Performance provides data on success rates (defined as the percent of students passing)
in school districts in the greater Cincinnati metropolitan area along with state averages.
a) Suggest the best regression model to predict math success as a function of success in the other
subjects by examining the correlation matrix; then run the regression tool for this set of
variables.
Here, Science reading and citizenship are Independent variable and math is dependent variable.
3
Regression output summary
R-squared (R2) is a statistical measure that represents the proportion of the variance for a dependent
variable that's explained by an independent variable or variables in a regression model. Whereas
correlation explains the strength of the relationship between an independent and dependent variable, R-
squared explains to what extent the variance of one variable explains the variance of the second
variable. So, if the R2 of a model is 0.50, then approximately half of the observed variation can be
explained by the model's inputs.
R Square 0.874789981
Adjusted R Square 0.870472394
The regression analysis shows that, by eliminating the correlated variables, the R2 value is 0.87. This
means that there is line fit of 87%. The value of R is 0.93. Since this regression model is based on two
variables, the model somewhat explains the data still it is not feasible. By using correlation, even though
the model has been curated still there are other variables we need to take into account.
4
b) Develop a multiple regression model to predict math success as a function of success in all other
subjects using the systematic approach described in this chapter. Is multicollinearity a problem?
From the above model we can see that the predicted value of the “READING” variable is high as it is
higher than 0.15 and hence it is not significant variable so we have to exclude the variable “READING”
and re run the regression model for the rest of variables.
5
Now by eliminating the variable “reading” we can see that the predicted values has been reduced and
also the model gives the better R2 value than the earlier model,still the predicted value of the remaining
variables are higher than 0.15 and leads to the not significant variable SO we remove the variable
“CITIZENSHIP” .
Now we can see that by eliminating the variable “ citizenship” the value of adjusted R2 has been increased
but still the variable “writing” has the predicted value which is higher than 0.15 hence it is also not an
significant variable, now we will remove the variable “writing” and see how uch will it lead to the perfect
fit model.
6
Now here we can see that by eliminatng the independent variables which has the predicted value more
than 0.15 two variables left within the criteria of 0.15 hence they are the significant variable, also this
gives us the highest value of R2 which is 0.87.
But the question comes that multicollinearity is a problem or not.
Multicollinearity refers to a situation in which two or more explanatory variables in a multiple regression
model are highly linearly related. Yes multicollinearity is a problem in this model because from this we
can see that two independent variables that is “maths and science” shows the positive relationship which
can cause the error in fitting the model and interpreting the results.
c) Compare the models in parts (a) and (b). Are they the same? Why or why not?
The models (a) and (b) we can say that they are not same because in (a) part we have used the simple
linear regression model because we have made the regression model on the basis of correlation matrix
as we have first analyse that which independent variables has the positive strong relationship and we
have found that maths and science has the positive relationship so on the basis of this analysis we have
used the simple linear regression model because there is only one dependent and one independent
variable .
On the other hand the model in part (b) is the multiple regression model in which there is more than one
independent variable and one dependent variable through this model we have analyse that maths and
science have the positive relationship which shows the multicollinearity we have done this by eliminating
the variables one by one by on the basis of the predictive value which has more than 0.15 we have
eliminated those variables in the further steps and hence we have reached the conclusion in the end that
maths and science shows the positive relationship.