Evaluation using Residual Plots
Goal
One of the key assumptions of any linear model is homoscedasticity, which– apart from being a $2 word that I need to spellcheck every time– means that the variance of your error terms stays generally-consistent across all fitted values.
Thus, it becomes essential to collect your error residuals and examine them against your predicted values. And if you’re being really diligent, your independent X
variables, as well.
Visually, this basically means that we want to see the former, not the latter, of the two pictures below.
%pylab inline
from IPython.display import Image
Image('images/scedasticity.jpg')
Populating the interactive namespace from numpy and matplotlib
Note: The “funnel shape” of the dataset showing Heteroscedasticity
A Practical Example
In researching the easiest way to put these plots together in Python, I stumbled upon the Yellowbrick library. It provides, among other things, a nice visualization wrapper around sklearn
objects for doing visual, statistical inference.
Borrowing from their docs, we’ll load one of their sample datasets, fit a simple model, then show its residual plot.
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from yellowbrick.datasets import load_bikeshare
from yellowbrick.regressor import ResidualsPlot
X, y = load_bikeshare()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = Ridge()
visualizer = ResidualsPlot(model, size=(400, 400))
visualizer.fit(X_train, y_train)
visualizer.score(X_test, y_test)
visualizer.poof();
As you can see, there’s a very clear “funnel shape” that demonstrates *hetero*scedasticity in our data. Furthermore, we can tell by looking at the histogram to the right that the residual values aren’t even centered around 0
.
To address this, the authors of of ISL advocate for applying some concave (downward-curved) function to our target variable, y
, to shrink the error in the larger responses.
Scaling
The log
function is a typical candidate for a transformation function, as it has the concave shape we’re looking for
X = np.log(np.linspace(1, 100, 100))
plt.plot(X);
Same code block as above, except we run all of the y
values through np.log()
X, y = load_bikeshare()
scaled_y = np.log(y)
X_train, X_test, y_train, y_test = train_test_split(X, scaled_y, test_size=0.2, random_state=42)
model = Ridge()
visualizer = ResidualsPlot(model, size=(400, 400))
visualizer.fit(X_train, y_train)
visualizer.score(X_test, y_test)
visualizer.poof();
However, it appears that not only did we overshoot wrenching the mean residual value down, we also created what looks to be a “reverse-funnel” after ~4
on the X-axis.
What about the square root function?
X = np.sqrt(np.linspace(0, 10, 100))
plt.plot(X);
X, y = load_bikeshare()
scaled_y = np.sqrt(y)
X_train, X_test, y_train, y_test = train_test_split(X, scaled_y, test_size=0.2, random_state=42)
model = Ridge()
visualizer = ResidualsPlot(model, size=(400, 400))
visualizer.fit(X_train, y_train)
visualizer.score(X_test, y_test)
visualizer.poof();
Better, but we still have a distrinct pickup before 10
or so. We could also examine the residuals against each of our features, but it’s probably worth trying a non-linear model at this point, as it might be better-suited for this problem