ROC and AUC

You may have seen images like this 1000 times, with some memorized understanding that the better the orange line hugs the top-left the “better your model is.”

%pylab inline
from IPython.display import Image

Image('images/basic_roc.PNG')
Populating the interactive namespace from numpy and matplotlib

png

Unpacking why that’s the case is pretty straight-forward after you frame this info with a a new fundamental assumption:

The whole goal of your model is to figuroute out how to separate your data points into two distributions.

Image('images/two_dist.PNG')

png

From here, you can set a decision threshold, which yields some confusion matrix.

Say in the example above, we’ve got

from helpers import print_cm

tn = 240
fn = 50
fp = 10
tp = 200

print_cm([[tn, fn], [fp, tp]])
                     Actual
                  False   True
Predicted  False    240     50
            True     10    200

It seems that we did a reasonable job mitigating false positives at this threshold.

Stats

Using these values we can calculate, among other things

  • True Positive Ratio
  • False Positive Ratio

The True Positive Ratio is a measure of “how many of ALL TRUE values did we call out?”

$\frac{TP}{TP + FN}$

tp / (tp + fn)
0.8

The False Positive Ratio is a measure of “how many of our True predictions were actually False?”

$\frac{FP}{FP + TN}$

fp / (fp + tn)
0.04

But this was just for ONE cherry-picked, example threshold.

At scale, we can generate these statistics for multiple Threshold values at once by using sklearn.metrics.roc_curve() which, given imputs y_true and y_score, tries a bunch of thresholds between 0 and 1 and returns

fpr : array, shape = [>2]
    Increasing false positive rates such that element i is the false
    positive rate of predictions with score >= thresholds[i].

tpr : array, shape = [>2]
    Increasing true positive rates such that element i is the true
    positive rate of predictions with score >= thresholds[i].

thresholds : array, shape = [n_thresholds]
    Decreasing thresholds on the decision function used to compute
    fpr and tpr. `thresholds[0]` represents no instances being predicted
    and is arbitrarily set to `max(y_score) + 1`.

Good Model, Bad Model

To restate our earlier assumption, “The whole goal of your model is to figuroute out how to separate your data points into two distributions.”

Extending this notion further, we say that a model is “Good” when it separates the True and False populations as above. Conversely, a model is not so good when it can’t separate the two classes, as below.

Image('images/bad_model.PNG')

png

Bringing it home, look at the difference between the ROC for the two models.

You can go into production with any Threshold value you want– one model is objectively better than the other because at all threshold values it’s as good or better at separating the two populations.

One Score

While the two-population and adjusting Threshold graphic is helpful for reinforcing intuition, all you need to know is that maximizing AUC means maximizing the degree of class separability in your model.

From there, you can do whatever. Depending on your use case, maybe you have a higher tolerance for False Positives and prioritize maximizing the number of True Positives you find, a better model will mean better separability will mean more bang for your buck adjusting the Threshold.

Sklearn

For concreteness, let’s use a simple dummy model to see how to examine AUC statistics in ScikitLearn.

#Make the dataset/train simple Logistic Regression.

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

X, y = make_classification()

model = LogisticRegression()

model.fit(X, y)
y_pred_probs = model.predict_proba(X)[:, 1]
# examine FPR and TPR at various acceptance thresholds
from sklearn.metrics import roc_curve

fpr, tpr, _ = roc_curve(y, y_pred_probs)
# plot the two against each other
plt.plot(fpr, tpr)
# and a x=y line to show break-even
plt.plot(fpr, fpr);

png

Pretty good for a dummy model on dummy data

from sklearn.metrics import roc_auc_score

roc_auc_score(y, y_pred_probs)
0.93959999999999999

Now say we’ve decided on an acceptance threshold (in this case .5) and we want to take a look at our Confusion Matrix

y_pred = model.predict(X)
from sklearn.metrics import confusion_matrix

print_cm(confusion_matrix(y, y_pred))
                     Actual
                  False   True
Predicted  False     43      7
            True      5     45

Resources

This video and the accompanying webapp were enormous helps in preparing this notebook.