Read the article Modeling Home Prices Using Realtor Data.
Create a directory (folder) named HomePricesProject
. Store all work for this project in this directory.
Create an Rmarkdown document named Project1.Rmd
inside the HomePricesProject
directory. Complete all subsequent directions in this document.
Read the data from http://ww2.amstat.org/publications/jse/datasets/homes76.dat.txt into an R object named HP
.
Remove columns 1, 7, 10, 15, 16, 17, 18, and 19 from HP
and store the result back in HP
.
Name the columns in HP
price
, size
, lot
, bath
, bed
, year
, age
, garage
, status
, active
, and elem
, respectively.
Use the function datatable
from the DT
package to display the data from HP
. Your data display should look similar to the one below.
Explore the data for variables size
, lot
, bath
, bed
, age
, garage
that might help explain the price
of a house. (Hint: matrix of correlations/ matrix of scatterplots)
What are the units for price
and size
?
mod1
that regresses price
on all of the variables in HP
with the exception of status
and year
. Produce a summary of mod1
and graph the residual plot. Based on your residual plot, what modification you might do to mod1
? Report the adjusted \(R^2\) value for mod1
.Create a new model (mod2
) by modifying mod1
using the modification you suggested in the previous question. Report the adjusted \(R^2\) value for mod2
. Do you see any improvements to mod1
? Justify your answer using adjusted \(R^2\) values and residual plots.
Create a new model (mod3
) by adding an interaction term of bath
and bed
and age
\(^2\) to mod1
. Report the adjusted \(R^2\) value for mod3
. Do you see any improvements to mod1
? Justify your answer using adjusted \(R^2\) values and residual plots.
Create a new model (mod4
) with all the terms in mod3
but using only edison
and harris
from elem
variable. Hint: When adding edison
in to the model you can use I(elem == 'edison')
. Your estimated coefficients should agree with those in the article. Report the adjusted \(R^2\) value for mod4
.
Conduct a F-test (anova(mod4, mod3
)) and perform a 4-step hypothesis test to check whether the Full model (mod3
) is better than reduced model (mod4
). Does your p-value agree with the one presented in the article?
mod4
to create a 95% prediction interval for a home with the following features: 1879 feet, lot size category 4, two and a half baths, three bedrooms, built in 1975, two-car garage, and near Harris Elementary School. Interpret your results.