Modeling and inference
Describe: What is the relationship between cars’ weights and their mileage?
Predict: What is your best guess for a car’s MPG that weighs 3,500 pounds?
Outcome: Variable whose behavior or variation you are trying to understand, on the y-axis (aka response variable, dependent variable)
Predictor(s): Other variable(s) that you want to use to explain the variation in the outcome, on the x-axis (aka explanatory variable(s), independent variable(s))
Model function: The regression line for predicting the outcome variable from the predictor variable(s), comprised generally of an intercept and a slope for each predictor
Predicted value: Output of the model function, which gives the typical (expected) value of the outcome conditioning on the predictor
Residuals: A measure of how far each case’s observed value is from its predicted value (based on a particular model)
mpg | wt |
---|---|
21 | 2.62 |
21 | 2.875 |
22.8 | 2.32 |
21.4 | 3.215 |
18.7 | 3.44 |
18.1 | 3.46 |
... | ... |
mpg | wt |
---|---|
21 | 2.62 |
21 | 2.875 |
22.8 | 2.32 |
21.4 | 3.215 |
18.7 | 3.44 |
18.1 | 3.46 |
... | ... |
is just as important as the model, if not more!
Statistics is the explanation of variation in the context of what remains unexplained.
Predict / classify: Plug in the value(s) of predictor(s) to the model to obtain the predicted value of the outcome
Describe: Quantify the relationship between predictor(s) and outcome with slopes
How do self-driving cars decide whether an object in front of them is a human, another car, or a trash can?
How does an online shopping website decide which ad to serve to you for the next item you might purchase?
What happens if either of these get it wrong?
Byambasukh, Oyuntugs, Harold Snieder, and Eva Corpeleijn. “Relation between leisure time, commuting, and occupational physical activity with blood pressure in 125 402 adults: the lifelines cohort.” Journal of the American Heart Association 9.4 (2020): e014313.
Background: Whether all domains of daily‐life moderate‐to‐vigorous physical activity (MVPA) are associated with lower blood pressure (BP) and how this association depends on age and body mass index remains unclear.
Methods and Results: In the population‐based Lifelines cohort (N=125,402), MVPA was assessed by the Short Questionnaire to Assess Health‐Enhancing Physical Activity, a validated questionnaire in different domains such as commuting, leisure‐time, and occupational PA. BP was assessed using the last 3 of 10 measurements after 10 minutes’ rest in the supine position. Hypertension was defined as systolic BP ≥140 mm Hg and/or diastolic BP ≥90 mm Hg and/or use of antihypertensives. In regression analysis, higher commuting and leisure‐time but not occupational MVPA related to lower BP and lower hypertension risk. Commuting‐and‐leisure‐time MVPA was associated with BP in a dose‐dependent manner. β Coefficients (95% CI) from linear regression analyses were −1.64 (−2.03 to −1.24), −2.29 (−2.68 to −1.90), and finally −2.90 (−3.29 to −2.50) mm Hg systolic BP for the low, middle, and highest tertile of MVPA compared with “No MVPA” as the reference group after adjusting for age, sex, education, smoking and alcohol use. Further adjustment for body mass index attenuated the associations by 30% to 50%, but more MVPA remained significantly associated with lower BP and lower risk of hypertension. This association was age dependent. β Coefficients (95% CI) for the highest tertiles of commuting‐and‐leisure‐time MVPA were −1.67 (−2.20 to −1.15), −3.39 (−3.94 to −2.82) and −4.64 (−6.15 to −3.14) mm Hg systolic BP in adults <40, 40 to 60, and >60 years, respectively.
Conclusions: Higher commuting and leisure‐time but not occupational MVPA were significantly associated with lower BP and lower hypertension risk at all ages, but these associations were stronger in older adults.