Nothing Special   »   [go: up one dir, main page]

Regression Analysis - Notes

Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 3

##OLS in Python Statsmodels - hands on

Q. From the above output you can see the various attributes of the dataset.
The 'target' column has the dependent values(housing prices) and rest of the colums
are the independent values that influence the target values
Lets find the relation between 'housing price' and 'average number of rooms per
dwelling' using stats model
Assign the values of column "RM"(average number of rooms per dwelling) to variable
X
similerly assign the values of 'target'(housing price) column to variable Y
sample code: values = data_frame['attribute_name']

Ans:
X = dataset['RM']
Y = dataset['target']

Q. import statsmodels.api as sm

Ans:
import statsmodels.api as sm

Q. - initialise the OLS model by passing target(Y) and attribute(X).Assign the


model to variable 'statsModel'
- fit the model and assign it to variable 'fittedModel'
- sample code for initialization: sm.OLS(target, attribute)

Ans:
statsModel = sm.OLS(Y, X)
fittedModel = statsModel.fit()

Q. print the summary of fittedModel using the summary() function

Ans:
print(fittedModel.summary())

Q. from the summary report note down the R-squared value and assign it to variable
'r_squared' in the below cell

Ans.
###Start code here
r_squared = fittedModel.rsquared
###End code(approx 1 line)
with open("output.txt", "w") as text_file:
text_file.write("rsquared= %f\n" % r_squared)

Q. print

Ans. print(r_squared)

----------------------------
##

Q. create a datframe named as 'X' such that it includes all the feature columns and
drop the target column.
assign the 'target' columns to variiable Y

Ans.
X = dataset.drop('target', axis = 1)
Y = dataset['target']

Q.
Now the dataframe X has just the features that influence the target
print the correlation matrix for dataframe X. Use '.corr()' function to compute
correlation matrix
from the correlation matrix note down the correlation value between 'CRIM' and
'PTRATIO' and assign it to variable 'corr_value'

Ans.
###Start code here
#print correlation matrix for X
print(X.corr())
corr_value = X['CRIM'].corr(X['PTRATIO'])
print(corr_value)

Q. import stats model as sm


initalize the OLS model with target Y and dataframe X(features)
fit the model and print the summary

Ans.
###Start code here
import statsmodels.api as sm
statsModel = sm.OLS(Y, X)
fittedModel = statsModel.fit()
print(fittedModel.summary())

###End code(approx 4 lines)

Q. from the summary report note down R squared value and assign it to variable
'r_square'

ans.
###Start code here
r_squared = fittedModel.rsquared
###End code(approx 1 line)
with open("output.txt", "w") as text_file:
text_file.write("corr= %f\n" % corr_value)
text_file.write("rsquared= %f\n" % r_squared)
Q. print

Ans.
print(r_squared)

You might also like