Read the article Modeling Home Prices Using Realtor Data.
Create a directory (folder) named HomePricesProject.
Store all work for this project in this directory.
Create an Rmarkdown document named Project1.Rmd
inside the HomePricesProject directory. Complete all
subsequent directions in this document.
Read the data from http://ww2.amstat.org/publications/jse/datasets/homes76.dat.txt
into an R object named HP.
Remove columns 1, 7, 10, 15, 16, 17, 18, and 19 from
HP and store the result back in HP.
Name the columns in HP price,
size, lot, bath,
bed, year, age,
garage, status, active, and
elem, respectively.
Use the function datatable from the DT
package to display the data from HP. Your data display
should look similar to the one below.
Explore the data for variables size,
lot, bath, bed, age,
garage that might help explain the price of a
house. (Hint: matrix of correlations/ matrix of scatterplots)
What are the units for price and
size?
mod1 that regresses
price on all of the variables in HP with the
exception of status and year. Produce a
summary of mod1 and graph the residual plot. Based on your
residual plot, what modification you might do to mod1?
Report the adjusted \(R^2\) value for
mod1.Create a new model (mod2) by modifying
mod1 using the modification you suggested in the previous
question. Report the adjusted \(R^2\)
value for mod2. Do you see any improvements to
mod1? Justify your answer using adjusted \(R^2\) values and residual plots.
Create a new model (mod3) by adding an interaction
term of bath and bed and age\(^2\) to mod1. Report the
adjusted \(R^2\) value for
mod3. Do you see any improvements to mod1?
Justify your answer using adjusted \(R^2\) values and residual plots.
Create a new model (mod4) with all the terms in
mod3 but using only edison and
harris from elem variable. Hint: When adding
edison in to the model you can use
I(elem == 'edison'). Your estimated coefficients should
agree with those in the article. Report the adjusted \(R^2\) value for mod4.
Conduct a F-test (anova(mod4, mod3)) and perform a
4-step hypothesis test to check whether the Full model
(mod3) is better than reduced model (mod4).
Does your p-value agree with the one presented in the article?
mod4 to create a 95% prediction interval for a home
with the following features: 1879 feet, lot size category 4, two and a
half baths, three bedrooms, built in 1975, two-car garage, and near
Harris Elementary School. Interpret your results.