Processing math: 100%

Logistic Regression Basics

Stated with variables

Our goal is to find predictions that accurately predict the actual values

We’ve got a bunch of input data

xRn

We’ve got our 0 or 1 target

y

Our predictions between 0 and 1

ˆy

We’ll arrive at our predictions using our weights

wRn

And our bias unit

bR

Both of which will be a result of our computation

But we need to coerce our prediction values to be between 0 and 1, therefore we need a sigmoid function.

σ(z)=11+ez

%pylab inline

def sigmoid(z):
    return 1/(1+np.exp(-z))
Populating the interactive namespace from numpy and matplotlib

Because as you can see, the values tend to 0 for negative numbers, and 1 for positive numbers. Furthermore, the curve crosses x=0 at y=0.5.

X = np.linspace(-10, 10, 100)

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(X, sigmoid(X))
ax.axvline(0, color='k')
<matplotlib.lines.Line2D at 0x8b6b7b8>

png

Cost Function

So our prediction vector is going to be a multiplication of the inputs x1,,xn by the weights w1,,wn, plus a bias term b.

ˆy=σ(wTx+b)whereσ(z)=11+ez

Traditionally, we might consider some sort of cost function like squared error– the difference between observation and actual, squared.

L(ˆy,y)=12(ˆyy)2

However, this leads to some very poorly-behaved curves. Instead, we use:

L(ˆy,y)=(ylogˆy+(1y)log(1ˆy))

Intution

Recall the shape of the log function:

  • It’s basically negative infinity at 0
  • It is exactly 0 at 1
  • It scales (slowly) to positive infinity
fig, ax = plt.subplots(figsize=(10, 6))

X = np.linspace(0.001, 10, 100)
ax.plot(X, np.log(X))
ax.axhline(0, color='black')
<matplotlib.lines.Line2D at 0x8f5ad68>

png

So looking back at this cost function and considering the behavior of log consider what happens in the following scenarios

If y=1

Our Loss Fucntion becomes

L=(log(ˆy)+0)

Therefore, if we predict 1, then log(1) evalues to 0– no error.

Conversely, if we predict 0, then we have basically infinite error. We don’t ever want to be certain that it’s a 0 when it’s actually not.

fig, ax = plt.subplots(figsize=(10, 6))

X = np.linspace(0.001, 1, 100)
ax.plot(X, -np.log(X))
[<matplotlib.lines.Line2D at 0x8f344a8>]

png

If y=0

Our Loss Fucntion becomes

L(ˆy,y)=(0+(1)log(1ˆy))

And looking at that last term, we see that as our prediction gets closer and closer to 1, the error becomes infinite.

fig, ax = plt.subplots(figsize=(10, 6))

X = np.linspace(0., .999, 100)
ax.plot(X, -np.log(1-X))
[<matplotlib.lines.Line2D at 0x910c358>]

png

Cost Function

If this intuition makes sense at a record-level, then extrapolating this loss function to each of our records helps us arrive at our Cost Function, expressed as

J(w,b)=1mmi=1L(ˆyi,yi)

J(w,b)=1mmi=1(yilogˆyi+(1yi)log(1ˆyi))