Logistic Regression Basics
Stated with variables
Our goal is to find predictions that accurately predict the actual values
We’ve got a bunch of input data
x∈Rn
We’ve got our 0
or 1
target
y
Our predictions between 0
and 1
ˆy
We’ll arrive at our predictions using our weights
w∈Rn
And our bias unit
b∈R
Both of which will be a result of our computation
But we need to coerce our prediction values to be between 0 and 1, therefore we need a sigmoid function.
σ(z)=11+e−z
%pylab inline
def sigmoid(z):
return 1/(1+np.exp(-z))
Populating the interactive namespace from numpy and matplotlib
Because as you can see, the values tend to 0
for negative numbers, and 1
for positive numbers. Furthermore, the curve crosses x=0
at y=0.5
.
X = np.linspace(-10, 10, 100)
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(X, sigmoid(X))
ax.axvline(0, color='k')
<matplotlib.lines.Line2D at 0x8b6b7b8>
Cost Function
So our prediction vector is going to be a multiplication of the inputs x1,…,xn by the weights w1,…,wn, plus a bias term b.
ˆy=σ(wTx+b)whereσ(z)=11+e−z
Traditionally, we might consider some sort of cost function like squared error– the difference between observation and actual, squared.
L(ˆy,y)=12(ˆy−y)2
However, this leads to some very poorly-behaved curves. Instead, we use:
L(ˆy,y)=−(ylogˆy+(1−y)log(1−ˆy))
Intution
Recall the shape of the log
function:
- It’s basically negative infinity at
0
- It is exactly
0
at1
- It scales (slowly) to positive infinity
fig, ax = plt.subplots(figsize=(10, 6))
X = np.linspace(0.001, 10, 100)
ax.plot(X, np.log(X))
ax.axhline(0, color='black')
<matplotlib.lines.Line2D at 0x8f5ad68>
So looking back at this cost function and considering the behavior of log
consider what happens in the following scenarios
If y=1
Our Loss Fucntion becomes
L=−(log(ˆy)+0)
Therefore, if we predict 1
, then log(1)
evalues to 0
– no error.
Conversely, if we predict 0
, then we have basically infinite error. We don’t ever want to be certain that it’s a 0
when it’s actually not.
fig, ax = plt.subplots(figsize=(10, 6))
X = np.linspace(0.001, 1, 100)
ax.plot(X, -np.log(X))
[<matplotlib.lines.Line2D at 0x8f344a8>]
If y=0
Our Loss Fucntion becomes
L(ˆy,y)=−(0+(1)log(1−ˆy))
And looking at that last term, we see that as our prediction gets closer and closer to 1
, the error becomes infinite.
fig, ax = plt.subplots(figsize=(10, 6))
X = np.linspace(0., .999, 100)
ax.plot(X, -np.log(1-X))
[<matplotlib.lines.Line2D at 0x910c358>]
Cost Function
If this intuition makes sense at a record-level, then extrapolating this loss function to each of our records helps us arrive at our Cost Function, expressed as
J(w,b)=1m∑mi=1L(ˆyi,yi)
J(w,b)=−1m∑mi=1(yilogˆyi+(1−yi)log(1−ˆyi))