Modeling and inference
\(\mathbf{x}\): Word and character counts, etc. in an e-mail
\[ y = \begin{cases} 1 & \text{it's spam}\\ 0 & \text{it's legit} \end{cases} \]
Subject: Congratulations! You’ve Been Selected for an Exclusive Reward 🎁
Dear Customer,
You have been chosen as one of our preferred recipients to receive a special complimentary gift. This is our way of thanking you for your continued interest in our services.
To claim your reward, simply complete our short survey. Your participation takes only 60 seconds, and your prize will be shipped at no cost to you.
Click here to start your survey and claim your reward [Claim Reward Link]
This exclusive offer is available for the next 48 hours only. Don’t miss your chance to enjoy this limited opportunity.
Warm regards,
Promotions Team
Exclusive Rewards Center
Email is spam | Email is not spam | |
---|---|---|
Email labelled spam | True positive | False positive (Type 1 error) |
Email labelled not spam | False negative (Type 2 error) | True negative |
False negative rate = P(Labelled not spam | Email spam) = FN / (TP + FN)
False positive rate = P(Labelled spam | Email not spam) = FP / (FP + TN)
Email is spam | Email is not spam | |
---|---|---|
Email labelled spam | True positive | False positive (Type 1 error) |
Email labelled not spam | False negative (Type 2 error) | True negative |
Sensitivity = P(Labelled spam | Email spam) = TP / (TP + FN)
Specificity = P(Labelled not spam | Email not spam) = TN / (FP + TN)
If you were designing a spam filter, would you want sensitivity and specificity to be high or low? What are the trade-offs associated with each decision?