Keras API Basics
The keras
API provides an excellent wrapper around various Deep Learning libraries, allowing both ease of use/uniform code while still plugging into expressive backends.
Generally speaking, keras
allows two interfaces to the underlying libraries it abstracts:
- Sequential, object-oriented
- Functional, as the name implies
To explain the difference, we’ll make the same Network in both fashions. This will consist of:
- Creating the structure:
- Dense, 32-node layer, that takes input shape 784
- Another 2 Dense 32 layers
- A final Dense 10 layer with a
softmax()
activation function
- Compiling the model with the
categorical_crossentropy
loss function andadam
optimizer - Printing a summary of our model
Sequential API
from keras import layers
from keras import models
Using TensorFlow backend.
model = models.Sequential()
model.add(layers.Dense(32, input_shape=(784,)))
model.add(layers.Dense(32))
model.add(layers.Dense(32))
model.add(layers.Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 32) 25120
_________________________________________________________________
dense_2 (Dense) (None, 32) 1056
_________________________________________________________________
dense_3 (Dense) (None, 32) 1056
_________________________________________________________________
dense_4 (Dense) (None, 10) 330
=================================================================
Total params: 27,562
Trainable params: 27,562
Non-trainable params: 0
_________________________________________________________________
Functional API
Very similar to the Sequential model, but we have to manually specify how layers flow into one another, via the trailing (past_tensor)
syntax.
Additionally, we specify which tensors are the first and last in the model– in this case they’re the layers.Input()
and layers.Dense(10)
objects.
input_tensor = layers.Input(shape=(784,))
x1 = layers.Dense(32, activation='relu')(input_tensor)
x2 = layers.Dense(32, activation='relu')(x1)
output_tensor = layers.Dense(10, activation='softmax')(x2)
model = models.Model(inputs=input_tensor, outputs=output_tensor)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 784) 0
_________________________________________________________________
dense_5 (Dense) (None, 32) 25120
_________________________________________________________________
dense_6 (Dense) (None, 32) 1056
_________________________________________________________________
dense_7 (Dense) (None, 10) 330
=================================================================
Total params: 26,506
Trainable params: 26,506
Non-trainable params: 0
_________________________________________________________________
Movie Example
Per chapter 3 in Francois Chollet’s Deep Learning with Python book, let’s take a quick look at how to build a simple model using data that comes native with keras
.
The imdb
dataset is essentially 50k movie reviews, where X
is a label-encoded representation of the words in a review, and y
is a positive or negative score.
from keras.datasets import imdb
num_words=10000
limits the number of words that we use to represent a review.
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
Insightful stuff in this review
train_data[0][:10]
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65]
They seemed to like the movie
train_labels[0]
1
keras.datasets.imdb
comes pre-loaded with a dictionary to help decode the X
representations of reviews. With some clever dict
magic, we can reconstruct what the original review read, more or less.
Note: The 0, 1, 2
indexes are used for “padding”, “start of sequence” and “unknown”, hence the -3
in the get()
function
word_index = imdb.get_word_index()
reverse_word_index = {idx: word for word, idx in word_index.items()}
decoded_review = ' '.join(reverse_word_index.get(idx - 3, '?') for idx in train_data[0])
decoded_review
"? this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert ? is an amazing actor and now the same being director ? father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for ? and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also ? to the two little boy's that played the ? of norman and paul they were just brilliant children are often left out of the ? list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all"
Taking this one step further, though, we want to be able to translate our 1 x numWords
observations into hot-encoded matricies that are consumable by a Neural Network.
import numpy as np
def vectorize_sequences(sequences, dimension=10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)
x_train.shape
(25000, 10000)
Much better
x_train[0]
array([ 0., 1., 1., ..., 0., 0., 0.])
The transformation on y
is trivial. Just list
to np.array
.
y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')
Reusing the Sequential()
architecture as above.
model = models.Sequential()
model.add(layers.Dense(32, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(32, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
Note that we specify that we want the accuracy
metric (more on this in a sec)
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
We split our x_train
and y_train
again in order to generate some cross-validation data
x_val = x_train[:10000]
partial_x_train = x_train[10000:]
y_val = y_train[:10000]
partial_y_train = y_train[10000:]
By passing x_val, y_val
, we can do some cross-validation on the fly.
Note that we assign the output of model.fit()
to history
history = model.fit(partial_x_train, partial_y_train,
epochs=20, batch_size=512,
validation_data=(x_val, y_val))
Train on 15000 samples, validate on 10000 samples
Epoch 1/20
15000/15000 [==============================] - 3s 176us/step - loss: 0.5184 - acc: 0.7785 - val_loss: 0.3404 - val_acc: 0.8718
Epoch 2/20
15000/15000 [==============================] - 2s 143us/step - loss: 0.2426 - acc: 0.9129 - val_loss: 0.2770 - val_acc: 0.8911
Epoch 3/20
15000/15000 [==============================] - 2s 146us/step - loss: 0.1533 - acc: 0.9473 - val_loss: 0.2972 - val_acc: 0.8829
Epoch 4/20
15000/15000 [==============================] - 2s 147us/step - loss: 0.1063 - acc: 0.9669 - val_loss: 0.3246 - val_acc: 0.8811
Epoch 5/20
15000/15000 [==============================] - 2s 149us/step - loss: 0.0734 - acc: 0.9817 - val_loss: 0.3586 - val_acc: 0.8767
Epoch 6/20
15000/15000 [==============================] - 2s 145us/step - loss: 0.0489 - acc: 0.9905 - val_loss: 0.4016 - val_acc: 0.8783
Epoch 7/20
15000/15000 [==============================] - 2s 147us/step - loss: 0.0319 - acc: 0.9949 - val_loss: 0.4464 - val_acc: 0.8742
Epoch 8/20
15000/15000 [==============================] - 2s 145us/step - loss: 0.0206 - acc: 0.9977 - val_loss: 0.4896 - val_acc: 0.8719
Epoch 9/20
15000/15000 [==============================] - 2s 143us/step - loss: 0.0140 - acc: 0.9994 - val_loss: 0.5310 - val_acc: 0.8700
Epoch 10/20
15000/15000 [==============================] - 2s 143us/step - loss: 0.0100 - acc: 0.9998 - val_loss: 0.5670 - val_acc: 0.8692
Epoch 11/20
15000/15000 [==============================] - 2s 145us/step - loss: 0.0067 - acc: 0.9999 - val_loss: 0.5972 - val_acc: 0.8677
Epoch 12/20
15000/15000 [==============================] - 2s 145us/step - loss: 0.0048 - acc: 0.9999 - val_loss: 0.6185 - val_acc: 0.8693
Epoch 13/20
15000/15000 [==============================] - 2s 145us/step - loss: 0.0037 - acc: 0.9999 - val_loss: 0.6399 - val_acc: 0.8678
Epoch 14/20
15000/15000 [==============================] - 2s 144us/step - loss: 0.0029 - acc: 0.9999 - val_loss: 0.6599 - val_acc: 0.8687
Epoch 15/20
15000/15000 [==============================] - 2s 145us/step - loss: 0.0024 - acc: 0.9999 - val_loss: 0.6764 - val_acc: 0.8679
Epoch 16/20
15000/15000 [==============================] - 2s 145us/step - loss: 0.0020 - acc: 0.9999 - val_loss: 0.6914 - val_acc: 0.8675
Epoch 17/20
15000/15000 [==============================] - 2s 144us/step - loss: 0.0017 - acc: 0.9999 - val_loss: 0.7069 - val_acc: 0.8669
Epoch 18/20
15000/15000 [==============================] - 2s 146us/step - loss: 0.0014 - acc: 1.0000 - val_loss: 0.7203 - val_acc: 0.8672
Epoch 19/20
15000/15000 [==============================] - 2s 145us/step - loss: 0.0012 - acc: 1.0000 - val_loss: 0.7322 - val_acc: 0.8671
Epoch 20/20
15000/15000 [==============================] - 2s 145us/step - loss: 0.0011 - acc: 1.0000 - val_loss: 0.7448 - val_acc: 0.8669
We can now access the history
values of history
(lol)
history_dict = history.history
history_dict.keys()
dict_keys(['val_loss', 'val_acc', 'loss', 'acc'])
This allows us to look at performance over training time
%pylab inline
epochs = range(len(history_dict['loss']))
Populating the interactive namespace from numpy and matplotlib
plt.plot(epochs, history_dict['loss'])
plt.plot(epochs, history_dict['val_loss'])
[<matplotlib.lines.Line2D at 0x17a4c4cf358>]