# Precision, Recall, and F1

Say we’ve got a simple binary classification dataset.

```
from sklearn.datasets import load_breast_cancer
import numpy as np
data = load_breast_cancer()
X = data.data
y = data.target
print(X.shape, y.shape)
```

```
(569, 30) (569,)
```

And we throw an arbitrary model at it

```
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X, y)
true_probs = model.predict_proba(X)[:, 1]
preds = (true_probs > .5).astype(int)
```

The model will yield some distribution of predictions on the confusion matrix.

```
from sklearn.metrics import confusion_matrix
from helpers import print_cm
result = confusion_matrix(y, preds)
print_cm(result)
```

```
Actual
False True
Predicted False 198 14
True 9 348
```

### Accuracy measures

As a measure of accuracy, we can calculate **precision** with the following

$\frac{TP}{TP + FP}$

or “How good we were at predicting `True`

if we say it’s `True`

“

```
precision = result[1][1] / (result[1][1] + result[1][0])
precision
```

```
0.97478991596638653
```

Similarly, **recall** is

$\frac{TP}{TP + FN}$

or, “How good were we at identifying all of the `True`

examples in our set?”

```
recall = result[1][1] / (result[1][1] + result[0][1])
recall
```

```
0.96132596685082872
```

Both of these have simple implementations in `sklearn`

```
from sklearn.metrics import precision_score, recall_score
print('Precision:', precision_score(y, preds))
print('Recall:', recall_score(y, preds))
```

```
Precision: 0.961325966851
Recall: 0.974789915966
```

### Graphically

Wikipedia has an excellent graphic representing these two metrics

```
from IPython.display import Image
Image('images/precision_recall.PNG')
```

## Best of both

Per the sklearn docs:

A system with high recall but low precision returns many results, but most of its predicted labels are incorrect when compared to the training labels. A system with high precision but low recall is just the opposite, returning very few results, but most of its predicted labels are correct when compared to the training labels. An ideal system with high precision and high recall will return many results, with all results labeled correctly.

The `f1_score`

is a statistic that “averages together” the two accuracy notions above via the “Harmonic mean”

$\frac{2}{\frac{1}{precision}\frac{1}{recall}}$

`2 / ((1/precision) + (1/recall))`

```
0.96801112656467303
```

Of course, it, too, is located in `sklearn.metrics`

```
from sklearn.metrics import f1_score
print('F1 score:', f1_score(y, preds))
```

```
F1 score: 0.968011126565
```