Bootstrapping Duke Forest houses
In this code along, we will use bootstrapping to construct confidence intervals.
Packages
We will use tidyverse and tidymodels for data exploration and modeling, respectively, and the openintro package for the data.
Data
The data are on houses that were sold in the Duke Forest neighborhood of Durham, NC around November 2020. It was originally scraped from Zillow, and can be found in the duke_forest
data set in the openintro R package.
glimpse(duke_forest)
Rows: 98
Columns: 13
$ address <chr> "1 Learned Pl, Durham, NC 27705", "1616 Pinecrest Rd, Durha…
$ price <dbl> 1520000, 1030000, 420000, 680000, 428500, 456000, 1270000, …
$ bed <dbl> 3, 5, 2, 4, 4, 3, 5, 4, 4, 3, 4, 4, 3, 5, 4, 5, 3, 4, 4, 3,…
$ bath <dbl> 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 5.0, 3.0, 5.0, 2.0, 3.0, 3.0,…
$ area <dbl> 6040, 4475, 1745, 2091, 1772, 1950, 3909, 2841, 3924, 2173,…
$ type <chr> "Single Family", "Single Family", "Single Family", "Single …
$ year_built <dbl> 1972, 1969, 1959, 1961, 2020, 2014, 1968, 1973, 1972, 1964,…
$ heating <chr> "Other, Gas", "Forced air, Gas", "Forced air, Gas", "Heat p…
$ cooling <fct> central, central, central, central, central, central, centr…
$ parking <chr> "0 spaces", "Carport, Covered", "Garage - Attached, Covered…
$ lot <dbl> 0.97, 1.38, 0.51, 0.84, 0.16, 0.45, 0.94, 0.79, 0.53, 0.73,…
$ hoa <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ url <chr> "https://www.zillow.com/homedetails/1-Learned-Pl-Durham-NC-…
Model
Fit a linear model predicting price of houses from their area.
# add code here
Bootstrap confidence interval
- Calculate the observed fit (slope):
# add code here
- Take n bootstrap samples and fit models to each one:
# add code here
- Why do we set a seed before taking the bootstrap samples?
To get the same random samples each time we run the code / render the document.
- Make a histogram of the bootstrap samples to visualize the bootstrap distribution.
# add code here
- Compute the 95% confidence interval as the middle 95% of the bootstrap distribution:
# add code here
Changing confidence level
Modify the code from Step 3 to create a 90% confidence interval.
# add code here
Modify the code from Step 3 to create a 99% confidence interval.
# add code here
- Which confidence level produces the most accurate confidence interval (90%, 95%, 99%)? Explain.
Add response here.
- Which confidence level produces the most precise confidence interval (90%, 95%, 99%)? Explain
Add response here.
- If we want to be very certain that we capture the population parameter, should we use a wider or a narrower interval? What drawbacks are associated with using a wider interval?
Add response here.