Modeling and inference
Numerical outcome and one numerical predictor:
Numerical outcome and one categorical predictor (two levels):
Numerical outcome, numerical and categorical predictors:
\[ y = \begin{cases} 1 & &&\text{eg. Yes, Win, True, Heads, Success}\\ 0 & &&\text{eg. No, Lose, False, Tails, Failure}. \end{cases} \]
If we can model the relationship between predictors (\(x\)) and a binary outcome (\(y\)), we can use the model to do a special kind of prediction called classification.
\(\mathbf{x}\): Word and character counts in an e-mail
\[ y = \begin{cases} 1 & \text{it's spam}\\ 0 & \text{it's legit} \end{cases} \]
Subject: Congratulations! You’ve Been Selected for an Exclusive Reward 🎁
Dear Customer,
You have been chosen as one of our preferred recipients to receive a special complimentary gift. This is our way of thanking you for your continued interest in our services.
To claim your reward, simply complete our short survey. Your participation takes only 60 seconds, and your prize will be shipped at no cost to you.
Click here to start your survey and claim your reward [Claim Reward Link]
This exclusive offer is available for the next 48 hours only. Don’t miss your chance to enjoy this limited opportunity.
Warm regards,
Promotions Team
Exclusive Rewards Center
\(\mathbf{x}\): features in a medical image
\[ y = \begin{cases} 1 & \text{it's cancer}\\ 0 & \text{it's healthy} \end{cases} \]
\(\mathbf{x}\): financial and demographic info about a loan applicant
\[ y = \begin{cases} 1 & \text{applicant is at risk of defaulting on loan}\\ 0 & \text{applicant is safe} \end{cases} \]
\(\mathbf{x}\): info about a criminal suspect and their case
\[ y = \begin{cases} 1 & \text{suspect is at risk of re-offending pre-trial}\\ 0 & \text{suspect is safe} \end{cases} \]
Instead of modeling \(y\) directly, we model the probability that \(y=1\):
Recall regression with a numerical outcome:
Similar when modeling a binary outcome:
It’s the logistic function:
\[ \text{Prob}(y = 1) = \frac{e^{\beta_0+\beta_1x}}{1+e^{\beta_0+\beta_1x}}. \]
If you set p = Prob(y = 1) and do some algebra, you get the simple linear model for the log-odds:
\[ \log\left(\frac{p}{1-p}\right) = \beta_0+\beta_1x. \]
This is called the logistic regression model.
\(p = Prob(y = 1)\) is a probability – A number between 0 and 1
\(p / (1 - p)\) is the odds – A number between 0 and \(\infty\)
The log odds \(log(p / (1 - p))\) is a number between \(-\infty\) and \(\infty\), which is suitable for the linear model
\[ \log\left(\frac{p}{1-p}\right) = \beta_0+\beta_1x \]
The logit function \(log(p / (1-p))\) is an example of a link function that transforms the linear model to have an appropriate range
This is an example of a generalized linear model
We estimate the parameters \(\beta_0\), \(\beta_1\), etc. using maximum likelihood (don’t worry about it) to get the “best fitting” S-curve
The fitted model is
\[ \log\left(\frac{\widehat{p}}{1-\widehat{p}}\right) = b_0+b_1x \]
Select a number \(0 < p^* < 1\):
Solve for the x-value that matches the threshold:
A new data point is observed up with \(x_{\text{new}}\). Which side of the boundary is it on?
A new data point is observed with \(x_{\text{new}}\). Which side of the boundary are they on?
Two numerical predictors and one binary outcome:
On the probability scale:
\[ \text{Prob}(y = 1) = \frac{e^{\beta_0+\beta_1x_1+\beta_2x_2+...+\beta_mx_m}}{1+e^{\beta_0+\beta_1x_1+\beta_2x_2+...+\beta_mx_m}}. \]
For the log-odds, a multiple linear regression:
\[ \log\left(\frac{p}{1-p}\right) = \beta_0+\beta_1x_1+\beta_2x_2+...+\beta_mx_m. \]
It’s linear! Consider two numerical predictors: