Multi-Class Regression with SoftMax
Note, these notes were taken in the context of Week 3 of Improving Deep Neural Networks
When your prediction task extends beyond a binary classification, you want to rely less on the sigmoid function and logistic regression. While you might see some success doing it anyways, and then doing some numpy.max()
dancing over your results, a much cleaner approach is to use the SoftMax function.
The Math
Essentially, softmax takes an arbitrary results vector, Z
, and instead of applying our typical sigmoid function to it, instead does the following:
- Overwrites each value,
z_i
witht_i
, where
$t_i = e^{z_i}$
- Normalizes each value by the sum of all values in the vector (the activation function)
$a = \frac{e^Z}{\sum{t_i}}$
This has the convenient effect of all values in the vector a
summing to 1– a rough “percent likelihood” value assigned to each cell.
- In terms of training, we can do Gradient Descent on this just fine as the cost function is essentially the same as that for Logistic Regression, but with more parts.
A Simple Example
Say we’re trying to decide between 4 separate classes and wind up with a final vector that looks like
import numpy as np
Z = np.array([5, 2, -1, 3])
Z
array([ 5, 2, -1, 3])
Determining softmax likelihood is easy enough, following along with the steps above
T = np.exp(Z)
T
array([148.4131591 , 7.3890561 , 0.36787944, 20.08553692])
A = T / np.sum(T)
A
array([0.84203357, 0.04192238, 0.00208719, 0.11395685])
We can see that Class_0
having a large value makes it likely and conversely Class_2
having a low value makes it unlikely, thus mirroring our Sigmoid Activation intuition.