Modeling and inference
The allbacks
data frame gives measurements on the volume and weight of 15 books, some of which are paperback and some of which are hardback
- volume
- cubic centimetres
- area
- square centimetres
- weight
- grams
- cover
- hb
or pb
# A tibble: 15 × 4
volume area weight cover
<dbl> <dbl> <dbl> <fct>
1 885 382 800 hb
2 1016 468 950 hb
3 1125 387 1050 hb
4 239 371 350 hb
5 701 371 750 hb
6 641 367 600 hb
7 1228 396 1075 hb
8 412 0 250 pb
9 953 0 700 pb
10 929 0 650 pb
11 1492 0 975 pb
12 419 0 350 pb
13 1010 0 950 pb
14 595 0 425 pb
15 1034 0 725 pb
allbacks_2_fit <- linear_reg() |>
fit(weight ~ volume + cover, data = allbacks)
tidy(allbacks_2_fit)
# A tibble: 3 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 198. 59.2 3.34 0.00584
2 volume 0.718 0.0615 11.7 0.0000000660
3 coverpb -184. 40.5 -4.55 0.000672
# A tibble: 3 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 198. 59.2 3.34 0.00584
2 volume 0.718 0.0615 11.7 0.0000000660
3 coverpb -184. 40.5 -4.55 0.000672
Slope - volume: Keeping cover constant, for each additional cubic centimetre books are larger in volume, the model predicts the weight to be higher, on average, by 0.718 grams.
Slope - cover: Keeping volume constant, the model predicts that paperback books weigh, on average, by 184 grams less than hardback books.
Intercept: The model predicts that hardback books with 0 volume are expected to weigh 198 grams, on average. (Doesn’t make sense in context.)
\(R^2\) is the percentage of variability in the outcome explained by the regression model.
weight ~ volume
weight ~ volume + cover
Adjusted \(R^2\) adds a penalty to \(R^2\) for additional predictors in the model, and is therefore a (more) objective measure for comparing models with different numbers of predictors.
volume
and cover
as predictors, and it is therefore the preferable model for predicting weight
.# A tibble: 1 × 2
r.squared adj.r.squared
<dbl> <dbl>
1 0.803 0.787
# A tibble: 1 × 2
r.squared adj.r.squared
<dbl> <dbl>
1 0.927 0.915
When interpreting slope coefficients for multiple regression models we need to state that one predictor is kept constant while the other increases.
Adjusted R-squared is useful when comparing models with different numbers of predictors - it helps you balance model complexity with explanatory power.