Thesis title: "MobileFaceNets: Efficient CNNs for Accurate RealTime Face Verification on Mobile Devices"
Paper address: https://arxiv.org/pdf/1804.07573v4.pdf
1 Introduction
In recent years, lightweight networks such as MobilenetV1, ShuffleNet, and MobileNetV2 have been mostly used for visual recognition tasks on mobile terminals. However, due to the particularity of the face structure, these networks have not achieved satisfactory results in face recognition tasks. Aiming at this problem, a lightweight network MobileFaceNet specially designed for face recognition is proposed.
As shown in the figure below, when using a network such as MobileNetV2 for face recognition, the average pooling layer gives the same weight to the Corner Unit and Center Unit of FMap-end, but in fact, for face recognition, the center unit's The degree of importance is obviously more important than the corner unit. Therefore, targeted optimization of the network is required. In the paper, the most important optimization is to use Global Depthwise Convolution (GDConv, global depthwise convolution layer) instead of Global Average Pooling (GAP, global average pooling layer), because the weights of GDConv are equivalent to the importance of different positions weight factor.
2.Global Depthwise Convolution (global depth convolution)
The author uses global depth convolution (GDConv) instead of global average pooling (GAP). The kernel size of the GDConv layer is equal to the input dimension size, pad=0, stride=1, and the calculation of GDConv is:
F is the input feature size WxHxM, K is the size of the depth convolution kernel WxHxM, G is the output size 1x1xM, and the calculation amount of the depth convolution is:
3. Network structure
The author uses the bottlenecks of MobileNetV2 as the main module to build the network. The bottlenecks in mobilefacenet are smaller than MobileNetV2, and the activation function uses PReLu (slightly better than Relu). In addition, fast downsampling is used at the beginning of the network, and the last few convolutions are used. The layer adopts early dimensionality reduction, and a 1x1 linear convolution layer is added after the linear global depth convolution layer as the feature output. Batch regularization is employed during training.
The original network calculation amount is 2.21 million MAdds, with 990,000 parameters. To reduce the amount of computation, change the input dimension from 112x112 to 112x96 or 96x96. In order to reduce the amount of parameters, the 1x1 convolutional layer after the GDConv layer is removed to generate a new network MobileFaceNet-M. On the basis of MobileFaceNet-M, the 1x1 convolution layer before removing the GDConv layer is generated to generate a new network MobileFaceNet-S.
4. Experiment
The author adopted MobileNetV1, ShuffleNet and MobileNetV2 as the basic network of mobilefacenet. The attenuation parameter weight is set to 4e-5, and the attenuation parameter of the global operation layer (GDConv or GAPool) at the end is set to 4e-4. SGD with a momentum of 0.9 is used as the optimizer, the batch size is set to 512, the learning rate starts from 0.1, and is divided by 10 in 36K, 52K and 58K iterations. Stop training at 60K iterations.
the code
####################################### MobileFaceNet ############################################# class Conv_block(Module): def __init__(self, in_c, out_c, kernel=(1, 1), stride=(1, 1), padding=(0, 0), groups=1): super(Conv_block, self).__init__() self.conv = Conv2d(in_c, out_channels=out_c, kernel_size=kernel, groups=groups, stride=stride, padding=padding, bias=False) self.bn = BatchNorm2d(out_c) self.prelu = PReLU(out_c) def forward(self, x): x = self.conv(x) x = self.bn(x) x = self.prelu(x) return x class Linear_block(Module): def __init__(self, in_c, out_c, kernel=(1, 1), stride=(1, 1), padding=(0, 0), groups=1): super(Linear_block, self).__init__() self.conv = Conv2d(in_c, out_channels=out_c, kernel_size=kernel, groups=groups, stride=stride, padding=padding, bias=False) self.bn = BatchNorm2d(out_c) def forward(self, x): x = self.conv(x) x = self.bn(x) return x class Depth_Wise(Module): def __init__(self, in_c, out_c, residual = False, kernel=(3, 3), stride=(2, 2), padding=(1, 1), groups=1): super(Depth_Wise, self).__init__() self.conv = Conv_block(in_c, out_c=groups, kernel=(1, 1), padding=(0, 0), stride=(1, 1)) self.conv_dw = Conv_block(groups, groups, groups=groups, kernel=kernel, padding=padding, stride=stride) self.project = Linear_block(groups, out_c, kernel=(1, 1), padding=(0, 0), stride=(1, 1)) self.residual = residual def forward(self, x): if self.residual: short_cut = x x = self.conv(x) x = self.conv_dw(x) x = self.project(x) if self.residual: output = short_cut + x else: output = x return output class Residual(Module): def __init__(self, c, num_block, groups, kernel=(3, 3), stride=(1, 1), padding=(1, 1)): super(Residual, self).__init__() modules = [] for _ in range(num_block): modules.append(Depth_Wise(c, c, residual=True, kernel=kernel, padding=padding, stride=stride, groups=groups)) self.model = Sequential(*modules) def forward(self, x): return self.model(x) class MobileFaceNet(Module): def __init__(self, embedding_size): super(MobileFaceNet, self).__init__() self.conv1 = Conv_block(3, 64, kernel=(3, 3), stride=(2, 2), padding=(1, 1)) self.conv2_dw = Conv_block(64, 64, kernel=(3, 3), stride=(1, 1), padding=(1, 1), groups=64) self.conv_23 = Depth_Wise(64, 64, kernel=(3, 3), stride=(2, 2), padding=(1, 1), groups=128) self.conv_3 = Residual(64, num_block=4, groups=128, kernel=(3, 3), stride=(1, 1), padding=(1, 1)) self.conv_34 = Depth_Wise(64, 128, kernel=(3, 3), stride=(2, 2), padding=(1, 1), groups=256) self.conv_4 = Residual(128, num_block=6, groups=256, kernel=(3, 3), stride=(1, 1), padding=(1, 1)) self.conv_45 = Depth_Wise(128, 128, kernel=(3, 3), stride=(2, 2), padding=(1, 1), groups=512) self.conv_5 = Residual(128, num_block=2, groups=256, kernel=(3, 3), stride=(1, 1), padding=(1, 1)) self.conv_6_sep = Conv_block(128, 512, kernel=(1, 1), stride=(1, 1), padding=(0, 0)) self.conv_6_dw = Linear_block(512, 512, groups=512, kernel=(7,7), stride=(1, 1), padding=(0, 0)) self.conv_6_flatten = Flatten() self.linear = Linear(512, embedding_size, bias=False) self.bn = BatchNorm1d(embedding_size) def forward(self, x): out = self.conv1(x) out = self.conv2_dw(out) out = self.conv_23(out) out = self.conv_3(out) out = self.conv_34(out) out = self.conv_4(out) out = self.conv_45(out) out = self.conv_5(out) out = self.conv_6_sep(out) out = self.conv_6_dw(out) out = self.conv_6_flatten(out) out = self.linear(out) out = self.bn(out) return l2_norm(out)