# Seven basic questions about pytorch convolutional layer

### 1. How to calculate the number of parameters of the ordinary convolutional layer?

The operation of ordinary convolution is divided into three dimensions. In the spatial dimension (H and W dimensions), the weight of the shared convolution kernel is shared, and the sliding window is multiplied and summed (integrating spatial information). In the input channel dimension, each channel uses a different volume. Accumulate the kernel parameters and sum the input channel dimensions (fusion channel information). The operation mode in the output channel dimension is parallel stacking (multiple), and there are as many output channels as there are convolution kernels.

The number of parameters of the ordinary convolution layer = the number of input channels × the size of the convolution kernel (such as 3 by 3) × the number of output channels (that is, the number of convolution kernels) + the number of output channels (when considering the bias)

### 2. How to calculate the output size of the convolutional layer?

Convolution output size calculation formula o = (i + 2p -k')//s + 1

Atrous convolution k' = d(k-1) + 1

o is the output size, i is the input size, p is the padding size, k is the convolution kernel size, s is the stride step size, and d is the hole convolution dilation expansion coefficient.

### 3. What is the function of dilated convolution? What are the disadvantages?

Compared with ordinary convolution, hole convolution can increase the receptive field while maintaining a small parameter scale, and is often used in the field of image segmentation. The disadvantage is that there may be a grid effect, that is, some pixels are missed by holes and cannot be used. This problem can be overcome by using a combination of hole convolutions with different expansion factors. Refer to the article: https://developer.orbbec.com. cn/v/blog_detail/892

### 4. What is group convolution and what is the function of group convolution?

Compared with ordinary convolution, group convolution divides the input channels into g groups, and the convolution kernels are also divided into corresponding g groups. Each convolution kernel only performs convolution on its corresponding group of input channels, and finally the g group The results are stacked and spliced. Since each convolution kernel only needs to perform convolution on 1/g channels of all input channels, the parameter amount is reduced to 1/g of ordinary convolution. Group convolution requires that the number of input channels and output channels are integer multiples of g. Reference article: https://zhuanlan.zhihu.com/p/65377955

### 5. What is a depthwise separable convolution, and what are the advantages of a depthwise separable convolution compared to ordinary convolution?

The idea of ​​depth separable convolution is to divide the operation of fusing spatial information and fusing channel information into two independent steps in convolution. The method is to first use the group convolution of g=m (the number of input channels) to fuse the spatial information channel by channel, and then use n (the number of output channels) 1 by 1 convolution to fuse the channel information. Its parameter amount is (m×k×k)+n×m, which is significantly reduced compared to the parameter amount of ordinary convolution m×n×k×k. At the same time, since the depth-separable convolution fusion spatial information and fusion channel information are separated from each other, it can often achieve better results than ordinary convolution.

### 6. What is transposed convolution/deconvolution? What does it do?

The general convolution operation will make the size of the feature map smaller, but transposed convolution (also known as deconvolution) can achieve the opposite effect, that is, enlarge the size of the feature map. There are two ways to understand transposed convolution. The first way is that transposed convolution is a special convolution that restores the feature map size by setting the appropriate padding size. The second understanding is based on the matrix multiplication representation method of the convolution operation. The transposed convolution is equivalent to transposing the representation matrix corresponding to the convolution kernel, and then multiplying the one-dimensional vector flattened by the output feature map to restore the original input. The size of the feature map. Reference article: https://zhuanlan.zhihu.com/p/115070523

### 7. What are the commonly used upsampling methods in the CV field?

In addition to using transposed convolution for upsampling, bilinear interpolation is generally used for upsampling in the field of image segmentation. This method has no parameters to learn, and usually the effect is better, except for bilinear In addition to interpolation, nearest neighbor interpolation can also be used for upsampling, but it is used less. In addition, another upsampling method is unpooling. Not used much.

### Code demo 1: Demonstration of convolution output size relationship

```import torch
from torch import nn
import torch.nn.functional as F

# Convolution output size calculation formula o = (i + 2*p -k')//s + 1
# Atrous convolution k' = d(k-1) + 1
# o is the output size, i is the input size, p is the padding size, k is the convolution kernel size, s is the stride step size, d is the dilation hole parameter

inputs = torch.arange(0,25).view(1,1,5,5).float() # i= 5
filters = torch.tensor([[[[1.0,1],[1,1]]]]) # k = 2

outputs = F.conv2d(inputs, filters) # o = (5+2*0-2)//1+1 = 4
outputs_s2 = F.conv2d(inputs, filters, stride=2)  #o = (5+2*0-2)//2+1 = 2
outputs_p1 = F.conv2d(inputs, filters, padding=1) #o = (5+2*1-2)//1+1 = 6
outputs_d2 = F.conv2d(inputs,filters, dilation=2) #o = (5+2*0-(2(2-1)+1))//1+1 = 3

print("--inputs--")
print(inputs)
print("--filters--")
print(filters)

print("--outputs--")
print(outputs,"\n")

print("--outputs(stride=2)--")
print(outputs_s2,"\n")

print(outputs_p1,"\n")

print("--outputs(dilation=2)--")
print(outputs_d2,"\n")

```
copy

The output is as follows:

```--inputs--
tensor([[[[ 0.,  1.,  2.,  3.,  4.],
[ 5.,  6.,  7.,  8.,  9.],
[10., 11., 12., 13., 14.],
[15., 16., 17., 18., 19.],
[20., 21., 22., 23., 24.]]]])
--filters--
tensor([[[[1., 1.],
[1., 1.]]]])
--outputs--
tensor([[[[12., 16., 20., 24.],
[32., 36., 40., 44.],
[52., 56., 60., 64.],
[72., 76., 80., 84.]]]])

--outputs(stride=2)--
tensor([[[[12., 20.],
[52., 60.]]]])

tensor([[[[ 0.,  1.,  3.,  5.,  7.,  4.],
[ 5., 12., 16., 20., 24., 13.],
[15., 32., 36., 40., 44., 23.],
[25., 52., 56., 60., 64., 33.],
[35., 72., 76., 80., 84., 43.],
[20., 41., 43., 45., 47., 24.]]]])

--outputs(dilation=2)--
tensor([[[[24., 28., 32.],
[44., 48., 52.],
[64., 68., 72.]]]])
```
copy

### Code demo 2: Demonstration of the number of convolutional layer parameters

```import torch
from torch import nn

features = torch.randn(8,64,128,128)
print("features.shape:",features.shape)
print("\n")

#Ordinary convolution
print("--conv--")
conv = nn.Conv2d(in_channels=64,out_channels=32,kernel_size=3)
conv_out = conv(features)
print("conv_out.shape:",conv_out.shape)
print("conv.weight.shape:",conv.weight.shape)
print("\n")

#group convolution
print("--group conv--")
conv_group = nn.Conv2d(in_channels=64,out_channels=32,kernel_size=3,groups=8)
group_out = conv_group(features)
print("group_out.shape:",group_out.shape)
print("conv_group.weight.shape:",conv_group.weight.shape)
print("\n")

#Depthwise Separable Convolution
print("--separable conv--")
depth_conv = nn.Conv2d(in_channels=64,out_channels=64,kernel_size=3,groups=64)
oneone_conv = nn.Conv2d(in_channels=64,out_channels=32,kernel_size=1)
separable_conv = nn.Sequential(depth_conv,oneone_conv)
separable_out = separable_conv(features)
print("separable_out.shape:",separable_out.shape)
print("depth_conv.weight.shape:",depth_conv.weight.shape)
print("oneone_conv.weight.shape:",oneone_conv.weight.shape)
print("\n")

#transposed convolution
print("--conv transpose--")
conv_t = nn.ConvTranspose2d(in_channels=32,out_channels=64,kernel_size=3)
features_like = conv_t(conv_out)
print("features_like.shape:",features_like.shape)
print("conv_t.weight.shape:",conv_t.weight.shape)

```
copy

The output is as follows:

```features.shape: torch.Size([8, 64, 128, 128])

--conv--
conv_out.shape: torch.Size([8, 32, 126, 126])
conv.weight.shape: torch.Size([32, 64, 3, 3])

--group conv--
group_out.shape: torch.Size([8, 32, 126, 126])
conv_group.weight.shape: torch.Size([32, 8, 3, 3])

--separable conv--
separable_out.shape: torch.Size([8, 32, 126, 126])
depth_conv.weight.shape: torch.Size([64, 1, 3, 3])
oneone_conv.weight.shape: torch.Size([32, 64, 1, 1])

--conv transpose--
features_like.shape: torch.Size([8, 64, 128, 128])
conv_t.weight.shape: torch.Size([32, 64, 3, 3])
```
copy

### Code Demo 3: Upsampling Layer

```import torch
from torch import nn

inputs = torch.arange(1, 5, dtype=torch.float32).view(1, 1, 2, 2)
print("inputs:")
print(inputs)
print("\n")

nearest = nn.Upsample(scale_factor=2, mode='nearest')
bilinear = nn.Upsample(scale_factor=2,mode="bilinear",align_corners=True)

print("nearest(inputs): ")
print(nearest(inputs))
print("\n")
print("bilinear(inputs): ")
print(bilinear(inputs))

```
copy

The output is as follows:

```inputs:
tensor([[[[1., 2.],
[3., 4.]]]])

nearest(inputs):
tensor([[[[1., 1., 2., 2.],
[1., 1., 2., 2.],
[3., 3., 4., 4.],
[3., 3., 4., 4.]]]])

bilinear(inputs):
tensor([[[[1.0000, 1.3333, 1.6667, 2.0000],
[1.6667, 2.0000, 2.3333, 2.6667],
[2.3333, 2.6667, 3.0000, 3.3333],
[3.0000, 3.3333, 3.6667, 4.0000]]]])
```
copy