Inception Architecture
As we’ve discussed in other notebooks, a key reason that we employ convolution to our image networks is to adjust the complexity of our model.
When we apply N
convolutional filters to a given layer, the following layer has final dimension equal to N
– one for each channel.
1x1 Convolution
Because convolution gets applied across all channels, a 1x1
convolution is less about capturing features in a given area of any channel, but instead translating information into other, easier-to-compute dimensions.
Intuition
It’s helpful to consider a 1x1
convolution as a sort of “Fully Connected sub-layer” that maps the value in all channels to one output cell in the next layer.
You can see that this intuition holds below, considering that we’re evaluating 32 input values against 32 weights– basic hidden layer stuff.
from IPython.display import Image
Image('images/one_by_one.png')
Additionally, applying more 1x1
convolution filters allows us to translate between the final input dimension to arbitrarily-many dimensions for the next layer, while maintaining the information gain of training (because each FC sub-layer will still update on backprop like a normal network).
Image('images/net_in_net.png')
But how is this useful?
Computation Benefits
Consider a simple case where we want to go from a 28x28x192
layer via 32 5x5
filters
Image('images/shrink_channels_before.png')
The amount of calculations that happen here are a direct function of:
- The dimensions of the output layer
- The number of channels in the input layer
- The size of the filters
Giving us
$ (28 * 28 * 32) * (192) * (5 * 5) \approx 120M$
Now see what happens when we use 1x1
convolution to create an intermediate layer.
Image('images/shrink_channels_after.png')
Enumerating the calculations happens in two stages.
First, going from the input layer to the hidden layer.
$ (28 * 28 * 16) * (192) * (1 * 1) \approx 2.4M $
Then going from the hidden layer to the output layer
$ (28 * 28 * 32) * (16) * (5 * 5) \approx 10M $
Summing the two, we get 12 Million – nearly a tenth of the number of computations as before, while still outputting a 28x28x32
layer, and maintaining strong information gain by employing multiple “Fully Connected sub-layers” as mentioned above.
Inception Network
Block Level
And so the Inception Network developed by Google uses this to great effect. Instead of figuring out what filter/kernel size to apply from layer to layer, they build in 1x1
, 3x3
, 5x5
, as well as a Max-Pool
layer for good measure, then concatenate them all together into a huge, 256
-channel output. They leave it to backpropagation to figure out which sections of the output are worth using for information gain.
Image('images/inception_motivation.png')
Mechanically, as above, they leverage the computation-reduction afforded by 1x1
filters for each component. This practice is often referred to as a bottleneck layer wherein you shrink the representation before expanding again via convolution filters.
Image('images/inception_block.png')
This results in:
- Very flexible learning strategies
- Relatively cheap computation
At Scale
So much so, that the architecture is implemented as a bunch of these blocks chained together
Image('images/inception_unzoomed.png')
Using It
V3
As we mentioned in the VGG architecture notebook, the Inception architecture is available for use in keras
(and also is a heafty download if you haven’t yet used it!)
from keras.applications import inception_v3
model = inception_v3.InceptionV3()
Using TensorFlow backend.
I’ll spare you scrolling through model.summary()
, it’s pretty huge.
len(model.layers)
313
Total params: 23,851,784
Trainable params: 23,817,352
Non-trainable params: 34,432
Documentation is available here
Inception ResNet
Alternatively, there is promising work being done to combine the best elements of the Inception framework with the information-passing elements residual Neural Networks.
You can employ the latest version of this work, again using keras
, with the following.
from keras.applications import inception_resnet_v2
model = inception_resnet_v2.InceptionResNetV2()
It’s even bigger
len(model.layers)
782
Total params: 55,873,736
Trainable params: 55,813,192
Non-trainable params: 60,544