Econometrics Project: Prof. Coord.: Serban Daniela, Phd. Vîlceanu Letiţia-Gabriela
Econometrics Project: Prof. Coord.: Serban Daniela, Phd. Vîlceanu Letiţia-Gabriela
Econometrics Project: Prof. Coord.: Serban Daniela, Phd. Vîlceanu Letiţia-Gabriela
BUCURETI
FACULTATEA DE ADMINISTRAREA AFACERILOR
cu predare n limbi strine
Econometrics project
Prof. coord.: Serban Daniela, Phd.
Vlceanu Letiia-Gabriela
The goal oft he project is to discover the relationship between these three variables. One
believes that if an individual has a higher income, his or her level of consumption expenditure
will, also, increase, while the CPI is also influencing the level of the consumption.
The first chapter of this case study consists on two hypothesis testing, the second will
describe the simple regression model, while the third chapter will present the multiple
regression model. Finally, the last chapter comprises the analysis of the residuals.
Hypothesis Testing
1) According to the Fred site, the average Real Personal Income in 1970 was in value of
2921.1 billion $. However, according to the OECD website, a study was conducted
for 31 countries and the results a mean of 3000.1 bllion $ with a standard deviation of
785.2. We would like to discover if the data collected from the Fred website are
reliable or if the ones from OECD are to be trusted.
Zcalc=
s2
n
=
(785.2)2
31
= 0.57 billion $.
2) According to the FRED website, between the years 1980 and 1984, the average
consumption was 4836.64 billion $, with a standard deviation of 142.5. Also, fort he
period of 1985 and 1989 the average consumption was 5817.4 billion $, with a
standard deviation of 170.4. In both of cases, we have samples with size of 5 years.
We shall conduct to see if in the first period the average consumption was lower than
in the second period.
x 1x 2( 1 2) 4836.645817.4
2 2
Zcalcultated = s s
1 2 = 29036.16+20306.25 = -9.87 billion $
+
n1 n 2 5
Regression Statistics
Multiple R 0.9986
R Square 0.997201
Adjusted R Square 0.997104
Standard Error 83.56681
Observations 31
Table 2
With a Multiple R = 0.99, one can say that we have a strong intensity relation between
the consumption and the income. Also, it is a direct relationship, due tot he fact the Multiple
R is bigger than zero.
The R square, or the ration of determination, represents the proportion of the variance
in the dependent variable that is predictable from the independent variable. Ranging from 0 to
1, our value of 0.99 indicates 99% of the consumption level can be predicated from the level
oft he income, while holding the other factors constant.
Again, the value oft he Adjusted R Square is 0.99, meaning the other factors, which
are held constant, are influencing the depedent variable with only 1%. To conclude, the other
factors do not a high degree of influence over the level of consumption.
Regression coefficient and intercept interpretation
The slope is positive one, meaning that there is a positive relationship between the
two variables. This means that, if the level of income is increasing, the level of
consumption increases with 1.02.
In order to check this validity of model, one must establish two hypothesis.
H0: The null hypothesis: 1= 2== 31
H1: The alternative hypothesis: At least two value are different.
Because we have a slope we can select the alternative hypothesis and confirm that the
model is valid.
ANOVA
df SS MS F Significance F
Regressio 7215167 7215167 10331.8
n 1 2 2 7 1.4096E-38
202518. 6983.41
Residual 29 9 2
7235419
Total 30 1
Table 3
Using ANOVA, we can extract the SSresidual (Sum of Squares Residual) which is equal
with 2202518.9, the MSregression has the value of 72151672 and the MSresidual has a value
of 6983.412
Using the Fischer Test (MSregression / MSresidual) indicated an F value of 10331.87.
Also, one can observe the fact that the significance F is 1.92917E-13, which is a value
very close to 0. Therefore we can say that probability for an error to hapen is very small. So,
having a lower probability to commit the error than the level of =0.005, we can correctly
reject H0 in favor of H1 and we can conclude that the model is valid. Also, the p-value is
1.4096E-38, which is lower than 0.05.
By looking at the two limits, which are 1.001093 and 1.042206, one can conclude that
the reference can be extended for the whole population, because none of the both limits
comprises the value zero. Also, the p-value is lower than the significance level, which is 0.05,
so chances to be wrong when stating that the slope is different to 0 are less than 5%.
We have three assumptions to interpret, in order to make sure that one can use safely this
model for forecasting:
We study these assumptions with the help of the three graphs generated by the Excel
program.
Sample Percentile
Income (Yd), expressed in billion $ Line Fit Plot
Consumption
expenditure (Xd),
expressed in billion $
Consumption expenditure (Xd), expressedPredicted
in billion $
Consumption
expenditure (Xd),
expressed in billion $
The first graph indicates a left skweness, but the points are spread closely around the
trendline.
The second graph shows that the points are, again, equally spread around the mean,
which proves that the model is homoskedastic, meaning that the errors have a constant
variance.
The third is used to determine the indepedence of residuals. We use the Durbin-Watson
formula. Given that the result is 1.272344, one can say that we have a positive correlation
between the residuals, meaning that one error will influence the next after it, which will be
be bigger. By studying the table of Durbin-Watson, the result is outside the limits of the
conditions imposed, so, the errors are auto-correlated.
Multiple regression model
Standa P-
Coefficie rd valu Lower
nts Error t Stat e 95%
57.364 11.893 1.84 564.76
Intercept 682.271 94 52 E-12 42
Income (Yd), 0.0323 27.086 1.25 0.8095
expressed in billion $ 0.875825 35 2 E-21 91
1.1202 4.6430 7.36 2.9066
CPI, expressed in $ 5.201491 87 01 E-05 88
With a Multiple R = 0.99, one can say that we have a strong intensity relation between
the consumption, the income and the year. Also, it is a direct relationship, due to the fact the
Multiple R is bigger than zero.
The R square, or the ration of determination, represents the proportion of the variance
in the dependent variable that is predictable from the independent variables. Ranging from 0
to 1, our value of 0.99 indicates 99% of the consumption level can be predicated from the
level oft he income and the year, while holding the other factors constant.
Again, the value oft he Adjusted R Square is 0.99, meaning the other factors, which
are held constant, are influencing the depedent variable with only 1%. To conclude, the other
factors do not a high degree of influence over the level of consumption.
Regression coefficient and intercept interpretation
The slopes are both positive, meaning the we have a posivite relation between the factors
presented. In other words, the real consumption will be positive.
In order to check this validity of model, one must establish two hypothesis.
H0: The null hypothesis: 1= 2== 31
H1: The alternative hypothesis: At least two value are different.
Because we have a slope we can select the alternative hypothesis and confirm that the
model is valid.
Significan
df SS MS F ce F
Regressi 722397 361198 8838.7
on 2 67 84 27 6.12E-40
114423. 4086.54
Residual 28 4 8
723541
Total 30 91
Using ANOVA, we can extract the SS residual (Sum of Squares Residual) which is equal
with 72239767, the MSregression has the value of 36119884 and the MSresidual has a value of
4086.548
Using the Fischer Test (MSregression / MSresidual) indicated an F value of 8838.727.
Also, one can observe the fact that the significance F is 6.12E-40, which is a value
very close to 0. Therefore we can say that probability for an error to hapen is very small. So,
having a lower probability to commit the error than the level of =0.005, we can correctly
reject H0 in favor of H1 and we can conclude that the model is valid. Also, the p-value is
6.12E-40, which is lower than 0.05.
We have three assumptions to interpret, in order to make sure that one can use safely this
model for forecasting:
We study these assumptions with the help of the three graphs generated by the Excel
program.
Samp le Percentile
Residuals 0
0.000 50.000 100.000150.000200.000
-200
CPI, expressed in $
CPI, expressed in $ Line Fit Plot
Consumption Predicted
expenditure (Xd), Consumption
expressed
Consumption exp in (Xd), expressed
enditure expenditure in
(Xd
b),illion $
billion $ expressed in
billion $
The first graph indicates a right skweness, but the points are spread closely around the
trendline.
The second graph shows that the points are, again, equally spread around the mean,
which proves that the model is homoskedastic, meaning that the errors have a constant
variance.
The third is used to determine the indepedence of residuals. We use the Durbin-Watson
formula. Given the result is 2.01361, one can say that we have a negative correlation
between the residuals, meaning that the one erros is influencing the next one, which will
be smaller. Also, analyzing the table and the limits imposed, the results is outside te limits
imposed and tending to 2, so the residuals are indepedent.
Income
(Yd),
expresse
d in CPI, expressed
billion $ in $
Income (Yd), expressed in
billion $ 1
CPI, expressed in $ 0.971317 1
The level of correlation between these two regressors is 0.97, which means that they have a
strong relationship, showing a sense of colinearity between them. However, the overall model is not
affected by this aspect.
Conclusion
With the help of both the simple and regression model, we can say that there is a
positive relationship between the three variables previously. We discovered that each year, the
level of income is increasing, and so, the level of consumption, while other factors have only
an influence of 1%.
Appendix
Income (Yd), expressed in Consumption expenditure (Xd), expressed
CPI, expressed in $
billion $ in billion $