1. Explore the data for variables size, lot, bath, bed, age, garage that might help explain the price of a house. (Hint: matrix of correlations/ matrix of scatterplots)

  2. What are the units for price and size?

  1. Create a model and name it mod1 that regresses price on all of the variables in HP with the exception of status and year. Produce a summary of mod1 and graph the residual plot. Based on your residual plot, what modification you might do to mod1? Report the adjusted \(R^2\) value for mod1.
  1. Create a new model (mod2) by modifying mod1 using the modification you suggested in the previous question. Report the adjusted \(R^2\) value for mod2. Do you see any improvements to mod1? Justify your answer using adjusted \(R^2\) values and residual plots.

  2. Create a new model (mod3) by adding an interaction term of bath and bed and age\(^2\) to mod1. Report the adjusted \(R^2\) value for mod3. Do you see any improvements to mod1? Justify your answer using adjusted \(R^2\) values and residual plots.

  1. Create a new model (mod4) with all the terms in mod3 but using only edison and harris from elem variable. Hint: When adding edison in to the model you can use I(elem == 'edison'). Your estimated coefficients should agree with those in the article. Report the adjusted \(R^2\) value for mod4.

  2. Conduct a F-test (anova(mod4, mod3)) and perform a 4-step hypothesis test to check whether the Full model (mod3) is better than reduced model (mod4). Does your p-value agree with the one presented in the article?

  1. Compute the training mean square prediction error for all four of the models. Which model has the smallest training mean square prediction error? Do you think this model will also have the smallest test mean square prediction error?
  1. Use mod4 to create a 95% prediction interval for a home with the following features: 1879 feet, lot size category 4, two and a half baths, three bedrooms, built in 1975, two-car garage, and near Harris Elementary School. Interpret your results.