Evaluation using Residual Plots

Goal

One of the key assumptions of any linear model is homoscedasticity, which– apart from being a $2 word that I need to spellcheck every time– means that the variance of your error terms stays generally-consistent across all fitted values.

Thus, it becomes essential to collect your error residuals and examine them against your predicted values. And if you’re being really diligent, your independent X variables, as well.

Visually, this basically means that we want to see the former, not the latter, of the two pictures below.

%pylab inline
from IPython.display import Image

Image('images/scedasticity.jpg')
Populating the interactive namespace from numpy and matplotlib

jpeg

Note: The “funnel shape” of the dataset showing Heteroscedasticity

A Practical Example

In researching the easiest way to put these plots together in Python, I stumbled upon the Yellowbrick library. It provides, among other things, a nice visualization wrapper around sklearn objects for doing visual, statistical inference.

Borrowing from their docs, we’ll load one of their sample datasets, fit a simple model, then show its residual plot.

from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from yellowbrick.datasets import load_bikeshare
from yellowbrick.regressor import ResidualsPlot

X, y = load_bikeshare()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = Ridge()
visualizer = ResidualsPlot(model, size=(400, 400))

visualizer.fit(X_train, y_train)  
visualizer.score(X_test, y_test)  
visualizer.poof();

png

As you can see, there’s a very clear “funnel shape” that demonstrates *hetero*scedasticity in our data. Furthermore, we can tell by looking at the histogram to the right that the residual values aren’t even centered around 0.

To address this, the authors of of ISL advocate for applying some concave (downward-curved) function to our target variable, y, to shrink the error in the larger responses.

Scaling

The log function is a typical candidate for a transformation function, as it has the concave shape we’re looking for

X = np.log(np.linspace(1, 100, 100))
plt.plot(X);

png

Same code block as above, except we run all of the y values through np.log()

X, y = load_bikeshare()
scaled_y = np.log(y)

X_train, X_test, y_train, y_test = train_test_split(X, scaled_y, test_size=0.2, random_state=42)

model = Ridge()
visualizer = ResidualsPlot(model, size=(400, 400))

visualizer.fit(X_train, y_train)
visualizer.score(X_test, y_test)
visualizer.poof();

png

However, it appears that not only did we overshoot wrenching the mean residual value down, we also created what looks to be a “reverse-funnel” after ~4 on the X-axis.

What about the square root function?

X = np.sqrt(np.linspace(0, 10, 100))
plt.plot(X);

png

X, y = load_bikeshare()
scaled_y = np.sqrt(y)

X_train, X_test, y_train, y_test = train_test_split(X, scaled_y, test_size=0.2, random_state=42)

model = Ridge()
visualizer = ResidualsPlot(model, size=(400, 400))

visualizer.fit(X_train, y_train)
visualizer.score(X_test, y_test)
visualizer.poof();

png

Better, but we still have a distrinct pickup before 10 or so. We could also examine the residuals against each of our features, but it’s probably worth trying a non-linear model at this point, as it might be better-suited for this problem