### Convolutional neural network (LeNet)

The full connection layer composed of multi-layer perceptron classifies the images in the fashion MNIST dataset. The height and width of each image are 28 pixels. We expand the pixels in the image line by line to obtain a vector with a length of 784 and input it into the full connection layer. However, this classification method has some limitations.

- The adjacent pixels of the image in the same column may be far apart in this vector. The patterns they form may be difficult to identify by the model.
- For large-scale input images, the use of full connection layer is easy to cause the model to be too large. Suppose the input is a color photo (including 3 channels) with a height and width of \ (1000\) pixels. Even if the number of outputs of the full connection layer is still 256, the shape of the layer weight parameter is also \ (3000000 \ times 256 \): it occupies about 3 GB of memory or video memory. This leads to overly complex models and excessive storage overhead.

Convolution layer attempts to solve these two problems. On the one hand, the convolution layer retains the input shape, so that the correlation of pixels in the two directions of height and width can be effectively recognized; On the other hand, the convolution layer computes the same convolution kernel and inputs at different positions repeatedly through the sliding window, so as to avoid excessive parameter size.

#### LeNet model

LeNet is divided into convolution layer block and full connection layer block

The convolution layer is used to maximize the sensitivity of the convolution layer to the pattern in the pool, and the convolution layer is used to maximize the position of the convolution layer in the pool. The basic unit in the convolution layer block is the convolution layer followed by the maximum pool layer. In the convolution layer block, each convolution layer uses a window of \ (5\times 5 \) and uses a \ (sigmoid \) activation function on the output

The output shape of convolution layer block is (batch size, channel, height, width). When the output of the convolution layer block is transferred into the fully connected layer block, the fully connected layer block will flatten each sample in a small batch. In other words, the input shape of the full connection layer will become two-dimensional, in which the first dimension is the sample in a small batch, the second dimension is the vector representation of each sample after flattening, and the vector length is the product of channel, height and width.

import d2lzh as d2l import mxnet as mx from mxnet import autograd, gluon, init, nd from mxnet.gluon import loss as gloss, nn import time #Initialize net net = nn.Sequential() #Number of channels added to convolution layer: 6, convolution layer: 5, activation function: sigmod net.add(nn.Conv2D(channels=6, kernel_size=5, activation='sigmoid'), #Maximum pool layer: 2, stride: 2 nn.MaxPool2D(pool_size=2, strides=2), #Number of channels added in convolution layer: 16, convolution layer: 5, activation function: sigmod nn.Conv2D(channels=16, kernel_size=5, activation='sigmoid'), #Maximum pool layer: 2, stride: 2 nn.MaxPool2D(pool_size=2, strides=2), # Density will convert the input of (batch size, channel, height, width) shape into # (batch size, channel * height * width) shape input nn.Dense(120, activation='sigmoid'), nn.Dense(84, activation='sigmoid'), nn.Dense(10))

Construct a single channel data sample with height and width of 28, and perform forward calculation layer by layer to view the output shape of each layer

# Initialize input samples X = nd.random.uniform(shape=(1, 1, 28, 28)) # Initialize net parameters net.initialize() # Loop net layer for layer in net: X = layer(X) print(layer.name, 'output shape:\t', X.shape)

#### Acquire data and training model

Fashion MNIST was used as the training data set

# Batch size: 256 batch_size = 256 # Get training set, test set train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size=batch_size)

evaluate_ Concurrency function:

# Calculation accuracy def evaluate_accuracy(data_iter, net, ctx): acc_sum, n = nd.array([0], ctx=ctx), 0 for X, y in data_iter: # If ctx represents GPU and corresponding video memory, copy the data to the video memory # Get input set, label X, y = X.as_in_context(ctx), y.as_in_context(ctx).astype('float32') # calculation error acc_sum += (net(X).argmax(axis=1) == y).sum() n += y.size # Calculate average error return acc_sum.asscalar() / n

train_ch3 function:

# This function has been saved in d2lzh package for future use def train_ch5(net, train_iter, test_iter, batch_size, trainer, ctx, num_epochs): # Output cpu or gpu print('training on', ctx) # The cross entropy loss function is obtained loss = gloss.SoftmaxCrossEntropyLoss() for epoch in range(num_epochs): train_l_sum, train_acc_sum, n, start = 0.0, 0.0, 0, time.time() for X, y in train_iter: # Copy X,y data to gpu X, y = X.as_in_context(ctx), y.as_in_context(ctx) with autograd.record(): # Calculate the prediction data y_hat = net(X) # Ask for loss l = loss(y_hat, y).sum() l.backward() # iteration trainer.step(batch_size) y = y.astype('float32') train_l_sum += l.asscalar() train_acc_sum += (y_hat.argmax(axis=1) == y).sum().asscalar() n += y.size test_acc = evaluate_accuracy(test_iter, net, ctx) print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, ' 'time %.1f sec' % (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc, time.time() - start))

Reinitialize the model parameters to the device variable ctx and use Xavier for random initialization. The loss function and training algorithm still use the cross entropy loss function and small batch random gradient descent.

lr, num_epochs = 0.9, 5 net.initialize(force_reinit=True, ctx=ctx, init=init.Xavier()) trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': lr}) train_ch5(net, train_iter, test_iter, batch_size, trainer, ctx, num_epochs) # output: training on gpu(0) epoch 1, loss 2.3082, train acc 0.113, test acc 0.293, time 21.3 sec epoch 2, loss 1.2689, train acc 0.492, test acc 0.612, time 16.4 sec epoch 3, loss 0.8517, train acc 0.665, test acc 0.730, time 13.6 sec epoch 4, loss 0.6973, train acc 0.725, test acc 0.744, time 15.0 sec epoch 5, loss 0.6355, train acc 0.748, test acc 0.763, time 20.0 sec