Modeling and inference
Outliers are observations that fall far from the main cloud of points.
They can be outlying in:
However, being outlying in a univariate sense does not always mean being outlying from the bivariate model.
Points that are in-line with the bivariate model usually do not influence the least squares line, even if they are extreme in \(x\), \(y\), or both.
Outliers: Points or groups of points that stand out from the rest of the data.
Leverage points: Points that fall horizontally away from the center of the cloud tend to pull harder on the line, so we call them points with high leverage or leverage points.
Influential points: Outliers, generally high leverage points, that actually alter the slope or position of the regression line.
Test your analysis with and without outliers.
Compare and discuss the impact of outliers on model fit.
Present both models to stakeholders to choose the most reasonable interpretation.
Warning
Removing outliers should only be done with strong justification – excluding interesting or extreme cases can lead to misleading models, poor predictive performance, and flawed conclusions.