Convolutional networks
In this section, we train an implementation of LeNet-5 [LBBH98] on MNIST. LeNet-5 was used in the 1990s to identify handwritten zip code numbers in the US Postal Service (Fig. 47). This network is characterized as having convolution and pooling blocks as feature extractor. The features are then passed to an MLP with 10 output nodes for each class label.
Remark. A block is composed of multiple layers that together form a basic functional unit. This is generally used in designing neural net architectures. See also AlexNet [KSH12] and VGG [SZ14] which take the conv+pool blocks design to the extreme (Fig. 46). AlexNet and VGG likewise contain consecutive convolutional blocks that downsample the spatial dims, while increasing the number of output channels so that network capacity is not diminished.

Fig. 46 Network architecture of AlexNet and VGG. More layers means more processing, which is why we see repeated convolutions and blocks. Source

Fig. 47 A bit of history. Timeline of the development of LeNet and MNIST. Source