Nothing Special   »   [go: up one dir, main page]

Business Analytics Assingment: Neha Singh

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

BUSINESS ANALYTICS

ASSINGMENT

NEHA SINGH
PGDM Batch – 2019-21
Roll No. & Section : 10 ‘A’

1
Ohio Education Performance Results Year 2000

School District Math Writing Science Reading Citizenship All


Indian Hill 89 95 91 98 95 83
Wyoming 86 98 87 96 93 81
Mason City 85 96 86 92 94 72
Madiera 88 94 88 95 82 69
Mariemont 74 99 88 92 89 68
Sycamore 80 85 84 88 87 68
Forest Hills 73 93 88 91 85 67
Kings Local 78 92 78 86 82 64
Lakota 73 90 81 88 85 64
Loveland 72 85 86 93 86 61
Southwest 73 92 73 82 78 58
Fairf ield 71 90 77 86 83 57
Oak Hills 75 88 79 86 77 57
Three Rivers 66 87 77 85 84 56
Milford 72 82 76 86 82 53
Ross 66 84 78 85 75 52
West Clermont 63 88 70 83 73 48
Reading 58 88 75 80 76 46
Princeton 59 83 63 75 76 46
Finneytown 61 79 62 71 67 45
Norwood 64 86 67 77 75 44
Lockland 52 88 64 79 82 41
Franklin City 49 85 67 79 70 40
Winton Woods 55 82 59 77 65 40
Northwest 51 75 61 74 62 38
North College Hill 50 77 57 76 66 35
Mount Healthy 40 87 53 72 62 32

2
Felicity Franklin 52 52 64 64 81 28
St. Bernard 40 81 48 59 41 26
Deer Park 40 69 52 66 43 25
Cincinnati Public 35 63 44 59 50 23

State Averages 60 79 68 79 72 46

Question
The State of Ohio Department of Education has a mandated ninth-grade proficiency test that
covers writing, reading, mathematics, citizenship (social studies), and science. The Excel file Ohio
Education Performance provides data on success rates (defined as the percent of students passing)
in school districts in the greater Cincinnati metropolitan area along with state averages.

a) Suggest the best regression model to predict math success as a function of success in the other
subjects by examining the correlation matrix; then run the regression tool for this set of
variables.

Here, Science reading and citizenship are Independent variable and math is dependent variable.

3
Regression output summary

R-squared (R2) is a statistical measure that represents the proportion of the variance for a dependent
variable that's explained by an independent variable or variables in a regression model. Whereas
correlation explains the strength of the relationship between an independent and dependent variable, R-
squared explains to what extent the variance of one variable explains the variance of the second
variable. So, if the R2 of a model is 0.50, then approximately half of the observed variation can be
explained by the model's inputs.

R Square 0.874789981
Adjusted R Square 0.870472394

The regression analysis shows that, by eliminating the correlated variables, the R2 value is 0.87. This
means that there is line fit of 87%. The value of R is 0.93. Since this regression model is based on two
variables, the model somewhat explains the data still it is not feasible. By using correlation, even though
the model has been curated still there are other variables we need to take into account.

4
b) Develop a multiple regression model to predict math success as a function of success in all other
subjects using the systematic approach described in this chapter. Is multicollinearity a problem?

From the above model we can see that the predicted value of the “READING” variable is high as it is
higher than 0.15 and hence it is not significant variable so we have to exclude the variable “READING”
and re run the regression model for the rest of variables.

5
Now by eliminating the variable “reading” we can see that the predicted values has been reduced and
also the model gives the better R2 value than the earlier model,still the predicted value of the remaining
variables are higher than 0.15 and leads to the not significant variable SO we remove the variable
“CITIZENSHIP” .

Now we can see that by eliminating the variable “ citizenship” the value of adjusted R2 has been increased
but still the variable “writing” has the predicted value which is higher than 0.15 hence it is also not an
significant variable, now we will remove the variable “writing” and see how uch will it lead to the perfect
fit model.

6
Now here we can see that by eliminatng the independent variables which has the predicted value more
than 0.15 two variables left within the criteria of 0.15 hence they are the significant variable, also this
gives us the highest value of R2 which is 0.87.
But the question comes that multicollinearity is a problem or not.
Multicollinearity refers to a situation in which two or more explanatory variables in a multiple regression
model are highly linearly related. Yes multicollinearity is a problem in this model because from this we
can see that two independent variables that is “maths and science” shows the positive relationship which
can cause the error in fitting the model and interpreting the results.

c) Compare the models in parts (a) and (b). Are they the same? Why or why not?

The models (a) and (b) we can say that they are not same because in (a) part we have used the simple
linear regression model because we have made the regression model on the basis of correlation matrix
as we have first analyse that which independent variables has the positive strong relationship and we
have found that maths and science has the positive relationship so on the basis of this analysis we have
used the simple linear regression model because there is only one dependent and one independent
variable .
On the other hand the model in part (b) is the multiple regression model in which there is more than one
independent variable and one dependent variable through this model we have analyse that maths and
science have the positive relationship which shows the multicollinearity we have done this by eliminating
the variables one by one by on the basis of the predictive value which has more than 0.15 we have
eliminated those variables in the further steps and hence we have reached the conclusion in the end that
maths and science shows the positive relationship.

You might also like