Modeling and inference
critics
and audience
movie_scores
A regression model is a function that describes the relationship between the outcome, \(Y\), and the predictor, \(X\).
\[ \begin{aligned} Y &= \color{black}{\textbf{Model}} + \text{Error} \\[8pt] &= \color{black}{\mathbf{f(X)}} + \epsilon \\[8pt] &= \color{black}{\boldsymbol{\mu_{Y|X}}} + \epsilon \end{aligned} \]
\[ \begin{aligned} Y &= \color{#325b74}{\textbf{Model}} + \text{Error} \\[8pt] &= \color{#325b74}{\mathbf{f(X)}} + \epsilon \\[8pt] &= \color{#325b74}{\boldsymbol{\mu_{Y|X}}} + \epsilon \end{aligned} \]
Use simple linear regression to model the relationship between a numerical outcome (\(Y\)) and a single numerical predictor (\(X\)): \[\Large{Y = \beta_0 + \beta_1 X + \epsilon}\]
\[ \Large{\hat{Y} = b_0 + b_1 X} \]
\[ \text{residual} = \text{observed} - \text{predicted} = y - \hat{y} \]
\[ e_i = \text{observed} - \text{predicted} = y_i - \hat{y}_i \]
\[ e^2_1 + e^2_2 + \dots + e^2_n \]
The slope of the model for predicting audience score from critics score is 0.519. Which of the following is the best interpretation of this value?
The slope of the model for predicting audience score from critics score is 0.519. Which of the following is the best interpretation of this value?
\[ \widehat{\text{audience}} = 32.3 + 0.519 \times \text{critics} \]
✅ The intercept is meaningful in context of the data if
🛑 Otherwise, it might not be meaningful!
The least squares regression line minimizes the sum of squares residuals. It has the following properties:
Goes through the center of mass point (the coordinates corresponding to average \(X\) and average \(Y\)): \(b_0 = \bar{Y} - b_1~\bar{X}\)
Slope of the line has the same sign as the correlation coefficient: \(b_1 = r \frac{s_Y}{s_X}\)
Sum of the residuals is zero: \(\sum_{i = 1}^n \epsilon_i = 0\)
Residuals and \(X\) values are uncorrelated