Neural networks is a part of the research field of artificial intelligence. At present, the most popular neural network is deep convolutional neural networks (CNNs). Although convolutional networks also have shallow structure, they are rarely used because of accuracy and expressiveness. At present, when it comes to CNNs and convolutional neural networks, academia and industry will no longer make a special distinction. Generally, it refers to convolutional neural networks with deep structure, and the number of layers varies from "several layers" to "tens of hundreds". At present, CNN has achieved great success in many research fields, such as speech recognition, image recognition, image segmentation, natural language processing and so on.
1.2 network structure
Basic CNN by Convolution, Activation, and Pooling is composed of three structures. The result of CNN output is the specific feature space of each image. When processing the image classification task, we will take the feature space output by CNN as the input of fully connected neural network (FCN), and use the fully connected layer to complete the mapping from the input image to the label set, that is, classification. Of course, the most important work in the whole process is how to adjust the network weight through training data iteration, that is, backward propagation algorithm. At present, the mainstream convolutional neural networks (CNN), such as VGg and RESNET, are adjusted and combined by simple CNN.
For example, using 277 * 277 RGB images and 96 11 * 11 * 3 kernels to scan at the same time, it is easy to get that the output feature maps are 96 267 * 267 two-dimensional feature maps, 267 * 267 is the size of the X and Y axes of a single image feature map, and 96 is the number of convolution kernels. The original three channels will be added as an element during integration. As shown in the figure above, after these feature maps are visualized, it can be seen that 4 and 35 represent edge features, 23 are blurred inputs, 10 and 16 emphasize gray changes, 39 emphasize eyes, and 45 emphasize the performance of red channels.
pooling is a subsampling operation. The main goal is to reduce the feature space of feature maps, or it can be considered to reduce the resolution of feature maps. Because there are too many feature map parameters, and the image details are not conducive to the extraction of high-level features.
At present, the main pooling operations are:
- max pooling: as shown in the above figure, max pooling of 2 * 2 is to take the maximum value of 4 pixels and keep it
- Average pooling: as shown in the above figure, the average pooling of 2 * 2 is to take the average value of 4 pixels and keep it
- L2 pooling: that is, the mean square value is taken and reserved
The pooling operation will reduce the parameters and reduce the resolution of feature maps, but it is uncertain whether this violent reduction is necessary when the computational power is sufficient. At present, some large CNN networks only use pooling occasionally
The above is the basic structure of a CNN stage. It should be emphasized that this structure is variable. At present, most networks are formed by adjusting parameters according to the basic structure stack or layer hopping connection. The output of CNN is feature maps, which can not only be input into the fully connected network for classification, but also be connected to another "mirror" CNN. If the input image dimension is the same as the feature dimension of the new CNN output feature maps, that is, the newly connected CNN is sampling and upsampling, and the obtained image can be considered as pixel level annotation and image segmentation.
2.1 data sets
There are 11 actions in total, and each action has 30 images
import pathlib import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers,models,Model,Sequential from tensorflow.keras.layers import Conv2D,BatchNormalization,Activation,Dense,Flatten,Dropout,MaxPooling2D import numpy as np import matplotlib.pylab as plt # Prepare data data_dir = "All_Resize" data_dir = pathlib.Path(data_dir) # Read out the c1-c10 folder image_count = len(list(data_dir.glob("*/*")))# The number of images is read out print("The overview of the picture is:", image_count) # Parameter setting batch_size = 4 image_height = 64 image_wijdth = 64 epochs = 10 # Build an ImageDataGenerator # Since the training set and test set are in the same folder, it is enough to build an ImageDataGenerator train_data_gen = tf.keras.preprocessing.image.ImageDataGenerator( rescale=None, # Replay the zoom factor, which defaults to None. If it is None or 0, the zoom will not be performed, otherwise the value will be multiplied by the data (before applying other transformations) rotation_range=45, # Randomly selected range shear_range=0.2, # Floating point number. Shear strength (shear angle in radians counterclockwise). validation_split=0.2, # The training set and test set are divided in the ratio of 8:2 horizontal_flip=True # Boolean value. Random horizontal flip. ) # Partition dataset train_ds = train_data_gen.flow_from_directory( directory=data_dir, target_size=(image_height, image_wijdth), batch_size=batch_size, class_mode='categorical', # The default is "category". This parameter determines the form of the returned label array shuffle=True, subset='training' ) print(train_ds.image_shape) test_ds = train_data_gen.flow_from_directory( directory=data_dir, target_size=(image_height, image_wijdth), batch_size=batch_size, class_mode='categorical', subset='validation', shuffle=True ) # Construction of CNN network: 3-layer convolution pooling layer + Flatten + full connection layer model = tf.keras.Sequential([ tf.keras.layers.Conv2D(16, 3, padding="same", activation="relu", input_shape=(image_height, image_wijdth, 3)), tf.keras.layers.MaxPooling2D(pool_size=(2, 2),padding='same', strides=1), tf.keras.layers.Conv2D(32, 3, activation="relu"), tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=1, padding='same'), tf.keras.layers.Conv2D(64, 3, activation="relu"), tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=1, padding='same'), tf.keras.layers.Conv2D(128, 3, activation="relu"), tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=1, padding='same'), tf.keras.layers.Flatten(), tf.keras.layers.Dense(100, activation="relu"), # If you select 1024 and 512 below, my computer can't run, with 1.4 billion parameters tf.keras.layers.Dense(100, activation="relu"), tf.keras.layers.Dense(11, activation="softmax"), ]) # model.summary() # Architecture of output model # Model compilation model.compile(optimizer="adam", loss=tf.keras.losses.BinaryCrossentropy(), metrics=["acc"]) # Operation of model history = model.fit( train_ds, validation_data=test_ds, epochs=epochs ) # Save model # model.save_weights('CNN_model.h5') model.summary() # Architecture of output model # View trainable objects print(model.trainable_variables) # file = open('./weight_CNN.txt',) # for v in model.trainable_variables: # file.write(str(v.name)+'\n') # file.write(str(v.shape)+'\n') # file.write(str(v.numpy())+'\n') # file.close() ############# show ################## # Display ACC and Loss of training set and test set """ history Role of: Visualization of accuracy and loss value is to acc and loss use matplot Draw it. We're using model.fit()When the function is trained, the loss and accuracy of the training set and the test set are recorded synchronously. have access to history Make a call """ acc = history.history['acc'] val_acc = history.history['val_acc'] loss = history.history['loss'] val_loss = history.history['val_loss'] # Print acc and loss, and use a graph to display. # Print acc. plt.subplot(1, 2, 1) # Divide the image into one row and two columns and display it in the first column plt.plot(acc, label='Training Accuracy') plt.plot(val_acc, label='Validation Accuracy') plt.title('Training and Validation Accuracy') plt.legend() plt.subplot(1, 2, 2) # Display it in the second column plt.plot(loss, label='Training Loss') plt.plot(val_loss, label='Validation Loss') plt.title('Training and Validation Loss') plt.legend() plt.show()
2.3 training results
In order to save time, only 10 iterations are used. If 200 iterations are used, the training result is about 95%