preface
The neural network is composed of input layer, hidden layer and output layer. The data is put into the input layer. Through the hidden layer and then to the output layer, the training data is compared with the output to obtain the error, and the error is transmitted back to the hidden layer to train the parameters of each layer.
This is the result diagram of a typical neural network:
Typical neural networks are used in many occasions, such as classification, and have achieved good results.
But what happens if there are too many parameters in the input layer? The following is a neural network to recognize handwritten digits:
This is a common application of neural network. How to recognize numbers with neural network?
The simplest idea is to take each pixel of the picture as an input point. If a handwritten numeral is 32x32 pixels and the input node is 32x32=1024, the number of hidden layers is assumed to be the same as that of the input layer, 32x32=1024 and the output layer is 10 (representing numbers from 0 to 9). How many parameters need to be trained? 1024x1024 + 1024x10=1058816, such parameters are terrible.
The cognitive process of the brain for pictures is an iterative and abstract process layer by layer. From pixel perception to image frame extraction, the brain also perceives layer by layer. In this process, so much data is not needed. For the machine, what is needed is the important information of the picture. Through layer by layer extraction, the characteristics of the picture are proposed, and the BP neural network is trained. This method is called convolution neural network.
In 1998, the famous Yann Lecun proposed convolution neural network algorithm and applied it to handwritten numeral recognition.
Convolutional neural network
Most of the ideas of convolutional neural network are the same as BP neural network. It can also be regarded as an extension and improvement of BP neural network, in order to reduce the number of training parameters. How to reduce the of training parameters in convolutional neural network? It mainly introduces two special layers based on BP neural network, one is called convolution layer and the other is called pooling layer.
The convolution process is as follows:
Convolution process: convolution - > pooling (sampling) - > convolution - > pooling (sampling) - > full connection layer.
Convolution layer
The convolution layer is calculated by the convolution core on the upper input layer by sliding the window one by one. Each parameter in the convolution core is equivalent to the weight parameter in the traditional neural network, which is connected with the corresponding local pixel. The sum of each parameter of the convolution core and the corresponding local pixel value is multiplied (usually plus a bias parameter) to obtain the result on the convolution layer.
For simplicity, consider a size of 5 × 5 images, and a 3 × Convolution kernel of 3. The convolution kernel here has nine parameters, which are recorded as Θ= [ θ ij]3 × Three. In this case, the convolution nucleus actually has 9 neurons, and their output forms a 3 × The matrix of 3 is called characteristic graph. The first neuron is connected to the first 3 of the image × The second neuron is connected to the second part. The details are shown in the figure below.
Through the definition of convolution, it can be simply regarded as the opportunity summation of the data of the original picture and the convolution kernel, which is the convolution operation of the picture.
After the picture passes through the convolution layer, plus an offset, it also needs to do excitation operation.
The forward process of convolution layer is summarized as follows:
def propagate(self): FMs = np.zeros([self.currLayer.get_n(), self.currLayer.shape()[0], self.currLayer.shape()[1]]) inFMs = self.prevLayer.get_FM() k = 0 # kernel index, there is one foreach i, j combination for j in range(self.currLayer.get_n()): # foreach FM in the current layer for i in range(self.prevLayer.get_n()): # foreach FM in the previous layer if self.connections[i, j] == 1: # foreach neuron in the feature map for y_out in range(self.currLayer.shape()[0]): for x_out in range(self.currLayer.shape()[1]): # iterate inside the visual field for that neuron for y_k in range(0, self.kernelHeight, self.stepY): for x_k in range(0, self.kernelWidth, self.stepX): FMs[j, y_out, x_out] += inFMs[i, y_out + y_k, x_out + x_k] * self.k[k, y_k, x_k] # add bias FMs[j, y_out, x_out] += 1 * self.biasWeights[j] # next kernel k += 1 # compute sigmoid (of a matrix since it's faster than elementwise) FMs[j] = self.act.func(FMs[j]) #print "out = ", FMs self.currLayer.set_FM(FMs) return FMs
Back to the original general drawing, after the original image comes in, first enter a convolution layer C1, which is composed of six 5x5 convolution cores to convolute an image of 28x28 (28 is defined as: 32-5 + 1). C1 has 156 trainable parameters (55 = 25 unit parameters and one bias parameter for each filter, a total of 6 filters, a total of (55 + 1) 6 = 156 parameters). Note that there are only 55 = 25 parameters for each convolution core, Instead of changing the parameters without convolution.
Pool layer
pool, i.e. downsamples, aims to reduce the number of characteristic graphs. The pooling operation is independent for each depth slice, and the scale is generally 2 * 2. The convolution operation is carried out relative to the convolution layer. The operations carried out by the pooling layer are generally as follows:
- Max Pooling. Take the maximum value of 4 points. This is the most commonly used pooling method.
- Mean Pooling. Take the mean value of 4 points.
- Gaussian pooling. Learn from the Gaussian fuzzy method. Not commonly used.
- Training pool is available. The training function ff accepts 4 points as input and 1 point in and out. Not commonly used.
The most common pool layer is 2 * 2 in scale and 2 in step, and each depth slice input is down sampled. Each MAX operation is performed on four numbers, as shown in the following figure:
Method of adopting the maximum value:
def propagate(self): [prevSizeY, prevSizeX] = self.prevLayer.shape() [currSizeY, currSizeX] = self.currLayer.shape() self.maximaLocationsX = np.zeros([self.currLayer.get_n(), self.currLayer.shape()[0], self.currLayer.shape()[1]]) self.maximaLocationsY = np.zeros([self.currLayer.get_n(), self.currLayer.shape()[0], self.currLayer.shape()[1]]) pooledFM = np.zeros([self.currLayer.get_n(), self.currLayer.shape()[0], self.currLayer.shape()[1]]) yi = self.prevLayer.get_FM() for n in range(self.prevLayer.get_n()): for i in range(currSizeY): for j in range(currSizeX): reg = yi[n, i*self.poolingStepY:(i+1)*self.poolingStepY, j*self.poolingStepX:(j+1)*self.poolingStepX] loc = np.unravel_index(reg.argmax(), reg.shape) + np.array([i*self.poolingStepY, j*self.poolingStepY]) self.maximaLocationsY[n, i, j] = loc[0] self.maximaLocationsX[n, i, j] = loc[1] pooledFM[n, i, j] = yi[n, loc[0], loc[1]] self.currLayer.set_FM(pooledFM)
The image data through the pooling layer is further reduced and the dimension is further reduced.
After passing the convolution layer and pool layer, you have to continue to pass the convolution layer and pool layer, but the parameters are different, and the training method is the same.
Full connection layer
There is no difference between this layer and BP neural network, which mainly turns the training parameters into results.
This layer basically outputs an input quantity (whether the output is convolution or ReLU or pool layer) and an N-dimensional vector whose N is the category selected by the program. The working mode of this fully connected layer is that it focuses on the output of the previous layer (the activation graph representing higher-order features) and determines which functions are the most relevant specific classes. For example, if the program predicts that some images are a dog, it will have high values in the activation map, representing high-order features, such as a paw or four legs. Similarly, if the program predicts that some images are bird functions, it will have high-order values in the activation map, representing high-order features such as wings or beaks.
Calculation output:
def propagate(self): x = self.prevLayer.get_x()[np.newaxis] if self.currLayer.hasBias: x = np.append(x, [1]) z = np.dot(self.w.T, x) # compute and store output y = self.act.func(z) self.currLayer.set_x(y) return y
The whole process can be organized in the following ways:
inputLayer0 = layerFM(1, 32, 32, isInput = True) convLayer1 = layerFM(6, 28, 28) poolLayer2 = layerFM(6, 14, 14) convLayer3 = layerFM(16, 10, 10) poolLayer4 = layerFM(16, 5, 5) convLayer5 = layerFM(100, 1, 1) hiddenLayer6 = layer1D(80) outputLayer7 = layer1D(10, isOutput = True) convolution01 = convolutionalConnection(inputLayer0, convLayer1, np.ones([1, 6]), 5, 5, 1, 1) pooling12 = poolingConnection(convLayer1, poolLayer2, 2, 2) convolution23 = convolutionalConnection(poolLayer2, convLayer3, np.ones([6, 16]), 5, 5, 1, 1) pooling34 = poolingConnection(convLayer3, poolLayer4, 2, 2) convolution45 = convolutionalConnection(poolLayer4, convLayer5, np.ones([16, 100]), 5, 5, 1, 1) full56 = fullConnection(convLayer5, hiddenLayer6) full67 = fullConnection(hiddenLayer6, outputLayer7)
Training process
The method of training parameters of convolution neural network also adopts back propagation, which is the same as BP neural network. Through the way of error forward propagation, the parameters to be trained are constantly adjusted.
Error calculation and weight adjustment method of the whole connecting layer:
def bprop(self, ni, target = None, verbose = False): yj = self.currLayer.get_x() if verbose: print "out = ", yj print "w = ", self.w # compute or retreive error of current layer if self.currLayer.isOutput: if target is None: raise Exception("bprop(): target values needed for output layer") currErr = -(target - yj) * self.act.deriv(yj) self.currLayer.set_error(currErr) else: currErr = self.currLayer.get_error() if verbose: print "currErr = ", currErr yi = np.append(self.prevLayer.get_x(), [1]) # compute error of previous layer if not self.prevLayer.isInput: prevErr = np.zeros(len(yi)) for i in range(len(yi)): prevErr[i] = sum(currErr * self.w[i]) * self.act.deriv(yi[i]) self.prevLayer.set_error(np.delete(prevErr,-1)) # compute weight updates dw = np.dot(np.array(yi)[np.newaxis].T, np.array(currErr)[np.newaxis]) self.w -= ni * dw
Error calculation method of pool layer:
def bprop(self): currErr = self.currLayer.get_FM_error() prevErr = np.zeros([self.prevLayer.get_n(), self.prevLayer.shape()[0], self.prevLayer.shape()[1]]) [currSizeY, currSizeX] = self.currLayer.shape() for n in range(self.prevLayer.get_n()): for i in range(currSizeY): for j in range(currSizeX): prevErr[n, self.maximaLocationsY[n, i, j], self.maximaLocationsX[n, i, j]] = currErr[n, i, j] self.prevLayer.set_FM_error(prevErr)
Error and weight adjustment of convolution layer:
def bprop(self, ni, target = None, verbose = False): yi = self.prevLayer.get_FM() # get output of previous layer yj = self.currLayer.get_FM() # get output of current layer # TODO: A conv. layer cannot be an output, remove computing error part if not self.currLayer.isOutput: currErr = self.currLayer.get_FM_error() else: currErr = -(target - yj) * self.act.deriv(yj) self.currLayer.set_FM_error(currErr) #print "\ncurrent error = \n", currErr # compute error in previous layer prevErr = np.zeros([self.prevLayer.get_n(), self.prevLayer.shape()[0], self.prevLayer.shape()[1]]) biasErr = np.zeros([self.currLayer.get_n()]) k = 0 for j in range(self.currLayer.get_n()): # foreach FM in the current layer for i in range(self.prevLayer.get_n()): # foreach FM in the previous layer if self.connections[i, j] == 1: # foreach neuron in the feature map for y_out in range(self.currLayer.shape()[0]): for x_out in range(self.currLayer.shape()[1]): # iterate inside the visual field for that neuron for y_k in range(0, self.kernelHeight, self.stepY): for x_k in range(0, self.kernelWidth, self.stepX): #FMs[j, y_out, x_out] += inFMs[i, y_out + y_k, x_out + x_k] * self.k[k, y_k, x_k]dd prevErr[i, y_out + y_k, x_out + x_k] += self.k[k, y_k, x_k] * currErr[j, y_out, x_out] # add bias biasErr[j] += currErr[j, y_out, x_out] * self.k[k, y_k, x_k] # next kernel k += 1 for i in range(self.prevLayer.get_n()): prevErr[i] = prevErr[i] * self.act.deriv(yi[i]) for j in range(self.currLayer.get_n()): biasErr[j] = biasErr[j] * self.act.deriv(1) self.prevLayer.set_FM_error(prevErr) # compute weights update dw = np.zeros(self.k.shape) dwBias = np.zeros(self.currLayer.get_n()) k = 0 for j in range(self.currLayer.get_n()): # foreach FM in the current layer for i in range(self.prevLayer.get_n()): # foreach FM in the previous layer if self.connections[i, j] == 1: # foreach neuron in the feature map for y_out in range(self.currLayer.shape()[0]): for x_out in range(self.currLayer.shape()[1]): # iterate inside the visual field for that neuron for y_k in range(0, self.kernelHeight, self.stepY): for x_k in range(0, self.kernelWidth, self.stepX): #FMs[j, y_out, x_out] += inFMs[i, y_out + y_k, x_out + x_k] * self.k[k, y_k, x_k]dd dw[k, y_k, x_k] += yi[i, y_out + y_k, x_out + x_k] * currErr[j, y_out, x_out] # add bias dwBias[j] += 1 * currErr[j, y_out, x_out] # next kernel k += 1 # update weights self.k -= ni * dw self.biasWeights -= ni * dwBias
summary
Convolutional neural network is a kind of network which is widely used. Through convolution and pooling, a large number of training parameters are reduced, and the error can be reduced quickly, which can achieve the goal of training the network. Convolution neural network has been widely used in image classification and achieved good results. It is an improvement and expansion of BP neural network, which makes BP neural network can be used in large input data.