import numpy as np import matplotlib.pyplot as plt from PIL import Image import numpy as np def cov2(img,kernel,strde): inw,inh = img.shape w,h=kernel.shape outw = int((inw -w )/strde + 1) outh = int((inh -h )/strde + 1) arr = np.zeros((outw, outh)) for g in range(outw): for t in range(outh): s= 0 for i in range(w): for j in range(h): s+=img[i+g*strde][j+t*strde]*kernel[i][j] # s = img[i][j] * f[i][j] arr[g][t]=s return arr img =  for i in range(7): temp = [0,0,0,255,255,255] img.append(temp) img = np.array(img) print("original image",img) kernel = np.array([[1,-1]]) test_pic = cov2(img,kernel,1) print("Figure 1 uses[1,-1]The convolution kernel of\n",test_pic) kernel = np.array([,[-1]]) test_pic = cov2(img,kernel,1) print("Figure 1 uses[1,-1]Transposed convolution kernel\n",test_pic) print("------------------------------------") img =  for i in range(7): temp = [0,0,0,255,255,255] img.append(temp) for i in range(7): temp = [255,255,255,0,0,0] img.append(temp) img = np.array(img) print("original image\n",img) kernel = np.array([[1,-1]]) test_pic = cov2(img,kernel,1) print("Figure 2 uses[1,-1]The convolution kernel of\n",test_pic) kernel = np.array([,[-1]]) test_pic = cov2(img,kernel,1) print("Figure 2 uses[1,-1]Transposed convolution kernel\n",test_pic) img = np.array([[0,0,0,0,0,0,0,0,0], [0,255,0,0,0,0,0,255,0], [0,0,255,0,0,0,255,0,0], [0,0,0,255,0,255,0,0,0], [0,0,0,0,255,0,0,0,0], [0,0,0,255,0,255,0,0,0], [0,0,255,0,0,0,255,0,0], [0,255,0,0,0,0,0,255,0], [0,0,0,0,0,0,0,0,0]]) print("original image\n",img) kernel = np.array([[1,-1]]) test_pic = cov2(img,kernel,1) print("Figure 3 Use[1,-1]The convolution kernel of\n",test_pic) kernel = np.array([,[-1]]) test_pic = cov2(img,kernel,1) print("Figure 3 Use[1,-1]Transposed convolution kernel\n",test_pic) kernel = np.array([[1,-1],[-1,1]]) test_pic = cov2(img,kernel,1) print("Figure 3 Use[1,-1]Transposed convolution kernel\n",test_pic) # test_im = Image.fromarray(test_pic) # test_im.show()
Convolution: First look at the mathematical definition of one-dimensional convolution:
There is nothing to say about one-dimensional convolution. Generally speaking, it is first inversion, then translation, multiplication and accumulation.
Two-dimensional convolution is actually a convolution operation between two matrices. One of them can be regarded as a "convolution kernel", and then the corresponding positions are multiplied and added to obtain a 1*1 matrix. In fact, two-dimensional convolution can be seen in The function is to extract features. When a matrix can find a similar position in another matrix, the value of the matrix after convolution is more obvious, so it feels that two-dimensional convolution is a matrix looking for something similar to itself. "good friend".
Two-dimensional convolution feature extraction schematic, here is his blog: https://blog.csdn.net/kingroc/article/details/88192878
Three-dimensional convolution I found a picture, this is his blog: 3D convolution , very figuratively:
In 3D convolution, the 3D convolution kernel can be used in all three directions (image height, width, aisle ) to move up. At each position, element-wise multiplication and addition provide a numerical value. Because the filter slides through a 3D space, the output values are also arranged in 3D space. That is, the output is a 3D data.
Convolution kernel: In my understanding, the matrix used to extract features from one matrix in another matrix in convolution is called a convolution kernel. Still take two-dimensional convolution as an example:
The convolution of multi-channel and multi-convolution kernels is actually convolution with a convolution kernel for each channel, and then the results are spliced into a color image according to the channel, blog address: Machine Learning 28: Multi-Convolution Kernel Processing Multi-Channel Feature Map Mechanism
Feature map: a color image, usually we think that there are three channels R,G,B, each channel will generate a feature map through the operation of the convolution kernel, that is to say, when the image pixel value passes through the convolution kernel The last thing is the feature map. Usually, how many feature maps will be generated after how many convolution kernels are filtered, that is, the number of "bean curd skin" layers in the figure below, which is also the depth of this layer. The deeper the network, the more the number of feature maps in this layer. As the network deepens, the length and width of the feature maps are reduced, and the features extracted by each feature map of this convolutional layer are more representative, so usually The latter convolution layer needs to increase the number of feature maps, that is to say, more convolution kernels are used for convolution operations to fully extract the features of the previous layer.
Feature selection: Select N features from the existing M features, and select some of the most effective features from the original features, which is somewhat similar to data preprocessing.
Step size: that is, after the convolution kernel performs one convolution, the step size of horizontal movement and the step size of vertical movement.
Filling: I think it is to fill the edges so that the edge features are not lost, so as to extract the edge features and pay more attention to the edge features.
Receptive field: that is, a certain point of the last feature map corresponds to the range of the original image, which is the receptive field. Two stacked conv3x3 receptive fields can be equal to one conv5x5, which can greatly improve computational efficiency.
2. Explore the role of different convolution kernels
①The influence of different values of convolution kernels on image convolution, try using the following website: https://setosa.io/ev/image-kernels/
②The influence of convolution kernels of different sizes:
Large convolution kernel
- Advantages: large receptive field
- Example: AlexNet, LeNet and other networks use relatively large convolution kernels, such as 5×5, 11×11
- Disadvantages: many parameters: large amount of calculation
Small convolution kernel
- Advantages: less parameters; less computation; integrating three nonlinear activation layers instead of a single nonlinear activation layer, increasing the model discrimination ability
- Example: After VGG
- Disadvantages: insufficient receptive field; deep stacked convolution (that is, stacked nonlinear activation), prone to uncontrollable factors
③The influence of different shapes of convolution kernels:
If we turn a 3*3 convolution kernel into a 1*3 convolution kernel and a 3*1 convolution kernel, then there will be:
3*3 convolution computation: 9×9 = 81 multiplications
The cumulative calculation amount of 1*3 convolution and 3*1 convolution: 3×15+3×9 = 72 multiplications
It can be seen that the operation of 1*3 convolution kernel and 3*1 convolution kernel is faster than that of 3*3.
3. Programming implementation
1. Realize edge detection, sharpening and blurring of grayscale images. (must do)
① Mean blur:
2. Adjust the convolution kernel parameters, test and summarize. (must do)
Image sharpening and blurring have given the blurring effects under different operators. Let's modify the parameters of the edge detection convolution kernel to see the effect of the modified parameters of edge detection:
Test with peer.jpeg:
Modify the operator:
Modify the convolution kernel stride:
3. Use pictures of different sizes, test and summarize. (must do)
Blur (mean blur):
4. Explore more types of convolution kernels. (optional)
The above has realized the graphical display under different convolution operators.
The experience of this assignment focuses on the principle that convolution can extract features.
Experience: In this experiment, I felt a deep understanding of the structure of convolution, including one-dimensional convolution, two-dimensional convolution and multi-dimensional convolution, and then learned the function of convolution kernel, how to achieve image edge detection and sharpening and blurring , as well as the function of filling, the limitation of step size... In short, I feel that the function of convolution for feature extraction is very powerful. As for why convolution can extract features, I saw one in class before. To extract the information of eye features, in fact, the convolution kernel is equivalent to an eye. If he can match the eye, the value of the point on the corresponding feature map will be very large, so the feature extraction can be performed, using the mouse below. An example to analyze is that the convolution kernel is like a mouse's ear, and then when the matrix of the convolution kernel slides to the mouse's ear, the value of the feature map will be very large, which means that the matching degree is very high, which is obvious Explains why convolution can extract features.