Representing XNOR via a simple Net

Imagine we want to build a simple classifier for the following datset, XNOR, that follows the truth table:

x1x2XNOR
001
010
100
111

When we plot this out visually, it’s clear that there’s no good way to do this in any sort of linear fashion. Where would you draw a straight line to get good separability?

%pylab inline
from IPython.display import Image
from helpers.xnor import make_xnor_dataset, plot_xnor_dataset

X, y = make_xnor_dataset(10)
plot_xnor_dataset(X, y);
Populating the interactive namespace from numpy and matplotlib

png

Deconstructing

More specifically, if we want to represent the entirety of the truth table, we can do so with

$ \big( X1 \wedge X2 \big) \lor \big((\neg X1) \wedge (\neg X2) \big)$

Which can, in turn, be broken down into sub-expressions. But first, recall our sigmoid function from logistic regression, used to convert a prediction coefficient between 0 and 1 to a binary prediction.

x_ = np.linspace(-5, 5, 1000)
y_ = 1 / (1 + np.exp(-x_))

plt.plot(x_, y_)
plt.gca().axvline(0, c='k', alpha=.5)
<matplotlib.lines.Line2D at 0x8046b00>

png

With an acceptance value of 0.5, our True values will be the ones that get piped through the sigmoid function and come out the other side with a value greater than 0.5. Simplifying, this means that if we want to output a True value, we just have to make sure that the linear combination that goes into the sigmoid is greater than zero

So breaking down the XNOR representation above, we used the following 3 symbols:

  • And
  • Or
  • Not

AND

Consider a simple dummy-Network where we’re trying to determine the weights, per sample, for nodes X1 and X2, which will stand in for the X and Y coordinates above. Also recall that we use the bias unit, X0 regardless of the value of our traning example.

Image('images/weights.png')

png

We want to figure out weights for each of these nodes such that all True/False combinations that come in for X0 and X1 ensure a correct truth table. Thus, we arrive at

Image('images/and.PNG')

png

Yielding the following truth table

x1x2zAND
00-301
01-100
10-100
11101

OR

We can follow a similar exercise for OR to get

Image('images/or.PNG')

png

x1x2zOR
00-101
01100
10100
11301

NOT

And a much-simpler NOT gives

Image('images/not.png')

png

x1zNOT
0101
1-100

Combining

Using the same intuition as above, we can detangle this messy logical statement into a network that looks like the following

$ \big( X1 \wedge X2 \big) \lor \big((\neg X1) \wedge (\neg X2) \big)$

Image('images/xnor.png')

png

And verify this checks out via the Truth Table (and a lot of tedious HTML)

x1x2z1a1z2a2OR
00-300 101 1
01-100 -100 0
10-100 -100 0
11301 -300 1

Implementing

Generating more training data, split equally on True and False

X, y = make_xnor_dataset(1000)

fig, ax = plt.subplots(figsize=(8, 6))
ax.scatter(X[:, 0], X[:, 1]);

png

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

model = Sequential()

model.add(Dense(units=2, activation='tanh', input_dim=2))
model.add(Dense(units=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
              optimizer='adam')
model.fit(X, y, epochs=200, verbose=0);
predictions = (model.predict(X) >= .5).reshape(-1).astype(int)
plot_xnor_dataset(X, predictions);

png

And how!

Efficiency

We managed to represent this behavior in 1 hidden layer. But consider what would have happened if we wanted to do a similar problem– computing *Exclusive OR for 10 inputs*. As Andrew Ng points out in Week 4 of his Deep Learning course, representing this as a deep neural network will be orders of magnitude more efficient than as a tall hidden layer.

This relies on the same principle of exponential growth– splitting things by a factor of 2, vs enumerating all of the 2-to-the-n different nodes.

Image('images/xor_exp.png')

png