Convolutional networks

In this section, we train an implementation of LeNet-5 [LBBH98] on MNIST. LeNet-5 was used in the 1990s to identify handwritten zip code numbers in the US Postal Service (Fig. 47). This network is characterized as having convolution and pooling blocks as feature extractor. The features are then passed to an MLP with 10 output nodes for each class label.

Fig. 45 Lenet-5 (1989) network architecture. Source

Remark. A block is composed of multiple layers that together form a basic functional unit. This is generally used in designing neural net architectures. See also AlexNet [KSH12] and VGG [SZ14] which take the conv+pool blocks design to the extreme (Fig. 46). AlexNet and VGG likewise contain consecutive convolutional blocks that downsample the spatial dims, while increasing the number of output channels so that network capacity is not diminished.