# Multi-Class Regression with SoftMax

*Note, these notes were taken in the context of Week 3 of Improving Deep Neural Networks*

When your prediction task extends beyond a binary classification, you want to rely less on the sigmoid function and logistic regression. While you might see some success doing it anyways, and then doing some `numpy.max()`

dancing over your results, a much cleaner approach is to use the *SoftMax* function.

### The Math

Essentially, softmax takes an arbitrary results vector, `Z`

, and instead of applying our typical sigmoid function to it, instead does the following:

- Overwrites each value,
`z_i`

with`t_i`

, where

$t_i = e^{z_i}$

- Normalizes each value by the sum of all values in the vector (the activation function)

$a = \frac{e^Z}{\sum{t_i}}$

This has the convenient effect of all values in the vector `a`

summing to 1– a rough “percent likelihood” value assigned to each cell.

- In terms of training, we can do Gradient Descent on this just fine as the cost function is essentially the same as that for Logistic Regression, but with more parts.

### A Simple Example

Say we’re trying to decide between 4 separate classes and wind up with a final vector that looks like

```
import numpy as np
Z = np.array([5, 2, -1, 3])
Z
```

```
array([ 5, 2, -1, 3])
```

Determining softmax likelihood is easy enough, following along with the steps above

```
T = np.exp(Z)
T
```

```
array([148.4131591 , 7.3890561 , 0.36787944, 20.08553692])
```

```
A = T / np.sum(T)
A
```

```
array([0.84203357, 0.04192238, 0.00208719, 0.11395685])
```

We can see that `Class_0`

having a large value makes it likely and conversely `Class_2`

having a low value makes it unlikely, thus mirroring our Sigmoid Activation intuition.