5 convolutional neural network (LeNet)

Convolutional neural network (LeNet)

The full connection layer composed of multi-layer perceptron classifies the images in the fashion MNIST dataset. The height and width of each image are 28 pixels. We expand the pixels in the image line by line to obtain a vector with a length of 784 and input it into the full connection layer. However, this classification method has some limitations.

  1. The adjacent pixels of the image in the same column may be far apart in this vector. The patterns they form may be difficult to identify by the model.
  2. For large-scale input images, the use of full connection layer is easy to cause the model to be too large. Suppose the input is a color photo (including 3 channels) with a height and width of \ (1000\) pixels. Even if the number of outputs of the full connection layer is still 256, the shape of the layer weight parameter is also \ (3000000 \ times 256 \): it occupies about 3 GB of memory or video memory. This leads to overly complex models and excessive storage overhead.
    Convolution layer attempts to solve these two problems. On the one hand, the convolution layer retains the input shape, so that the correlation of pixels in the two directions of height and width can be effectively recognized; On the other hand, the convolution layer computes the same convolution kernel and inputs at different positions repeatedly through the sliding window, so as to avoid excessive parameter size.

LeNet model

LeNet is divided into convolution layer block and full connection layer block
The convolution layer is used to maximize the sensitivity of the convolution layer to the pattern in the pool, and the convolution layer is used to maximize the position of the convolution layer in the pool. The basic unit in the convolution layer block is the convolution layer followed by the maximum pool layer. In the convolution layer block, each convolution layer uses a window of \ (5\times 5 \) and uses a \ (sigmoid \) activation function on the output
The output shape of convolution layer block is (batch size, channel, height, width). When the output of the convolution layer block is transferred into the fully connected layer block, the fully connected layer block will flatten each sample in a small batch. In other words, the input shape of the full connection layer will become two-dimensional, in which the first dimension is the sample in a small batch, the second dimension is the vector representation of each sample after flattening, and the vector length is the product of channel, height and width.

import d2lzh as d2l
import mxnet as mx
from mxnet import autograd, gluon, init, nd
from mxnet.gluon import loss as gloss, nn
import time
#Initialize net
net = nn.Sequential()
#Number of channels added to convolution layer: 6, convolution layer: 5, activation function: sigmod
net.add(nn.Conv2D(channels=6, kernel_size=5, activation='sigmoid'),
        #Maximum pool layer: 2, stride: 2
        nn.MaxPool2D(pool_size=2, strides=2),
        #Number of channels added in convolution layer: 16, convolution layer: 5, activation function: sigmod
        nn.Conv2D(channels=16, kernel_size=5, activation='sigmoid'),
        #Maximum pool layer: 2, stride: 2
        nn.MaxPool2D(pool_size=2, strides=2),
        # Density will convert the input of (batch size, channel, height, width) shape into
        # (batch size, channel * height * width) shape input
        nn.Dense(120, activation='sigmoid'),
        nn.Dense(84, activation='sigmoid'),
        nn.Dense(10))

Construct a single channel data sample with height and width of 28, and perform forward calculation layer by layer to view the output shape of each layer

# Initialize input samples
X = nd.random.uniform(shape=(1, 1, 28, 28))
# Initialize net parameters
net.initialize()
# Loop net layer
for layer in net:
    X = layer(X)
    print(layer.name, 'output shape:\t', X.shape)

Acquire data and training model

Fashion MNIST was used as the training data set

# Batch size: 256
batch_size = 256
# Get training set, test set
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size=batch_size)

evaluate_ Concurrency function:

# Calculation accuracy
def evaluate_accuracy(data_iter, net, ctx):
    acc_sum, n = nd.array([0], ctx=ctx), 0
    for X, y in data_iter:
        # If ctx represents GPU and corresponding video memory, copy the data to the video memory
        # Get input set, label
        X, y = X.as_in_context(ctx), y.as_in_context(ctx).astype('float32')
        # calculation error
        acc_sum += (net(X).argmax(axis=1) == y).sum()
        n += y.size
    # Calculate average error
    return acc_sum.asscalar() / n

train_ch3 function:

# This function has been saved in d2lzh package for future use
def train_ch5(net, train_iter, test_iter, batch_size, trainer, ctx,
              num_epochs):
    # Output cpu or gpu
    print('training on', ctx)
    # The cross entropy loss function is obtained
    loss = gloss.SoftmaxCrossEntropyLoss()
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n, start = 0.0, 0.0, 0, time.time()
        for X, y in train_iter:
            # Copy X,y data to gpu
            X, y = X.as_in_context(ctx), y.as_in_context(ctx)
            with autograd.record():
                # Calculate the prediction data
                y_hat = net(X)
                # Ask for loss
                l = loss(y_hat, y).sum()
            l.backward()
            # iteration
            trainer.step(batch_size)
            y = y.astype('float32')
            train_l_sum += l.asscalar()
            train_acc_sum += (y_hat.argmax(axis=1) == y).sum().asscalar()
            n += y.size
        test_acc = evaluate_accuracy(test_iter, net, ctx)
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, '
              'time %.1f sec'
              % (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc,
                 time.time() - start))

Reinitialize the model parameters to the device variable ctx and use Xavier for random initialization. The loss function and training algorithm still use the cross entropy loss function and small batch random gradient descent.

lr, num_epochs = 0.9, 5
net.initialize(force_reinit=True, ctx=ctx, init=init.Xavier())
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': lr})
train_ch5(net, train_iter, test_iter, batch_size, trainer, ctx, num_epochs)
# output:
training on gpu(0)
epoch 1, loss 2.3082, train acc 0.113, test acc 0.293, time 21.3 sec
epoch 2, loss 1.2689, train acc 0.492, test acc 0.612, time 16.4 sec
epoch 3, loss 0.8517, train acc 0.665, test acc 0.730, time 13.6 sec
epoch 4, loss 0.6973, train acc 0.725, test acc 0.744, time 15.0 sec
epoch 5, loss 0.6355, train acc 0.748, test acc 0.763, time 20.0 sec

Posted by Diggler on Thu, 14 Apr 2022 17:08:11 +0930