Nothing Special   »   [go: up one dir, main page]

Sowjanya Lab6

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

MSc - Statistics for Managers - MMSCDM5A0323

Lab 6 (Individual)1

Instructions: Students will answer the following questions in a software of their choosing (e.g., Excel, R, Stata, SPSS,
etc.). Students without previous experience in these software applications should use Microsoft Excel. Please submit
the completed lab assignment and associated Excel file (or other file depending on the program used) by the end of
class. The instructor will select and grade a single question from the lab assignment.

1) The German rail authority made an analysis of the number of train users on the network in the southern part of
the country since 1993 covering the months for June, July, and August. The Transport Authority was interested to
see if they could develop a relationship between the number of users and another easily measurable variable. In
this way they would have a forecasting tool. The variables they selected for developing their models were the
unemployment rate in this region and the number of foreign tourists visiting Germany. The following is the data
collected:

1a. Illustrate the relationship between the number of train users and unemployment rate on a scatter diagram.

1
Questions were taken and adapted from Waller (2010).
Train Users (mil.)
50
45
40
35
30 f(x) = − 3.5570235833435 x + 57.2458559202385
Axis Title

R² = 0.505383139074642
25
20
15
10
5
0
6 7 8 9 10 11 12 13 14
Axis Title

1b. Using simple regression analysis, what are your conclusions about the correlation between the number of train
users and the unemployment rate?
Correlation between train users and the unemployment rate -0.71090304 there's a strong negative
correlation

1c. Illustrate the relationship between the number of train users and foreign tourists o n a scatter diagram.

Tourists (mil.)
25

20 f(x) = 0.512984752489034 x − 0.499408201629187


R² = 0.915618133803102

15
Axis Title

10

0
5 10 15 20 25 30 35 40 45 50
Axis Title
1d. Using simple regression analysis, what are your conclusions about the correlation between the number of train
users and the number of foreign tourists?
Correlation between train users and the number of foreign tourists is 0.95 which is perfect positive correlation

1e. In any given year, if the number of foreign tourists were estimated to be 10 million, what would be a forecast
for the number of train users?
108.5088746

1f. If a polynomial correlation (to the power of 2) between train users and foreign tourists was used, what are your
observations?
y = -0.0103x2 + 1.0267x - 5.8176
As a =-0.0103 is negative correlation.

1g. Using multiple regression analysis, what are your conclusions about the correlation between the number of
train users, the unemployment rate, and the number of foreign tourists? In this model, the number of train users is
the dependent variable and the other two variables are considered to be independent.
From the model it says that train users and unemployment have negative correlation while train users and foreign
tourists have positive correlation i.e if one factor increases the other factor also increases in first case vise versa
2) The data in the table below is the amount of goods imported into the United States from 1960 until 2006
(Source: US Census Bureau, Foreign Trade division, www.census.gov/foreign-trade/statistics/historicalgoods, 8
June 2007.)
U.S. U.S.
Year U.S. Imports Year Year
Imports Imports
1960 14758 1976 124228 1992 536528
1961 14537 1977 151907 1993 589394
1962 16260 1978 176002 1994 668690
1963 17048 1979 212007 1995 749374
1964 18700 1980 249750 1996 803113
1965 21510 1981 265067 1997 876794
1966 25493 1982 247642 1998 918637
1967 26866 1983 268901 1999 1031784
1968 32991 1984 332418 2000 1226684
1969 35807 1985 338088 2001 1148231
1970 39866 1986 368425 2002 1167377
1971 45579 1987 409765 2003 1264307
1972 55797 1988 447189 2004 1477094
1973 70499 1989 477665 2005 1681780
1974 103811 1990 498438 2006 1861380
1975 98185 1991 491020
2a. Develop a time series scatter data for the complete data.

U.S. Imports
2000000
1800000
1600000
1400000
1200000
1000000
800000
600000
400000
200000
0
1950 1960 1970 1980 1990 2000 2010
2b. From the scatter diagram developed in Question 1 develop linear regression equations using just the following
periods to develop the equation where x is the year.
Also give the corresponding coefficient of determination: 1960–1964; 1965–1969; 1975–1979; 1985–1989; 1995–
1999; 2002–2005.

Years Sum RSQ y=mx+b


1960–1964 81303 0.836795 y = 1039.5x - 2E+06
1965–1969 142667 0.971187 y = 3609.2x - 7E+06
1975–1979 762329 0.995441 y = 27942x - 6E+07
1985–1989 2041132 0.997261 y = 35792x - 7E+07
2002–2005 5590558 0.976743 y = 175600x - 4E+08

2c. Using the relationships developed in Question 2, what would be the forecast values for 2006?
2006 1836639 calculated
2006 1861380 given

2d. Compare these forecast values obtained in Question 3 with the actual value for 2006. What are your
comments?
Ans: A close match between the calculated and given values which significs that difference may indicate areas for
improvement or additional factors influencing the data

2e. Develop the linear equation and the corresponding coefficient of determination for the complete data and
show this information on the scatter diagram.
Ans: The polynomial equation which determines the entire data (y = 1111.3x2 - 4E+06x + 4E+09), (y=ax2+bx+c)

2f. Develop the exponential equation and the corresponding coefficient of determination for the complete data
and show this information on the scatter diagram.
2g. Develop the fourth power polynomial equation and the corresponding coefficient of determination for the
complete data and show this information on the scatter diagram.
U.S. Imports
2000000
1800000
f(x) = 1.2309832 x⁴ − 9737.5434 x³ + 28885686 x² − 38083600000 x
1600000 + 18829100000000
1400000
1200000
1000000
800000
600000
400000
200000
0
1950 1960 1970 1980 1990 2000 2010

2h. Discuss your observations and results for this exercise including the forecasts that you have developed.
 There is a clear upward trend in U.S. Imports over the years.
 The data shows periods of significant growth, especially in the 1970s and 1980s.
 Its RSQ of the data is 0.83, it means that 83% of the variability in the U.S imports is explained by the Year
and its fits in the model because it near to the 1
 With the use of equation (y=xm+c) I have calculated for the year 2006 which was nearly to the given data
and with the use of this I have calculated the present year (2023) which is 4,821,832 which might be
correct according to the data

You might also like