Author: Zhan Pengzhou Intel IoT Industry Innovation Ambassador
This article will introduce common techniques for embedding data preprocessing into AI models based on the OpenVINOTM model optimizer or preprocessing API, helping readers to further improve End-to-end performance of AI inference programs . All sample programs in this article are open source: https://gitee.com/ppov-nuc/resnet_ov_ppp.git, and are based on the 12th generation Intel® Core™ processor AI Developer Kit Complete the test on .
by YOLOv5 model conversion As an example, using the model optimizer command:
mo --input_model yolov5s.onnx --data_type FP16
Convert the yolov5s.onnx model to an IR format model, and convert the model accuracy from FP32 to FP16 - this is the most common way to use the model optimizer. Based on the above IR model, when writing an AI reasoning program, since the numerical accuracy and shape of the image data are different from the numerical accuracy and shape required by the model input node, it is necessary to preprocess the data before inputting it into the model.
by YOLOv5 model As an example, use the zidane.jpg image that comes with the YOLOv5 code warehouse to print out the numerical accuracy and shape of the image, as well as the numerical accuracy and shape of the model input nodes. The comparison is as follows, as shown in Figure 1-1.
1 Image read by OpenCV vs model input node
As can be seen from the above figure, the image data read by OpenCV's imread() function is different from the requirements of the model input node in terms of data shape, numerical accuracy, and numerical range, as shown in the following table
Image data read by OpenCV | YOLOv5 model input node | |
Data shape (Shape) | [720, 1280, 3] | [1, 3, 640, 640] |
Numeric precision (dtype) | UINT8 | FP32 |
Value range | 0 - 255 | 0.0 - 1.0 |
color channel order | BGR | RGB |
Data layout (Layout) | HWC | NCHW |
Due to the above differences, the data must be preprocessed before being passed into the model to meet the requirements of the model input node. Data preprocessing can be implemented by programming in the inference code, by using the model optimizer, or by using the OpenVINOTM preprocessing API, which will be introduced in detail in this article.
1.1 Data preprocessing with model optimizer
1.1.1 Model optimizer preprocessing parameters
The model optimizer can embed preprocessing operations such as color channel order adjustment and image data normalization into the model, refer to " Interpretation of key points of OpenVINO™ model conversion technology ". By specifying parameters:
- --mean_values: All input data will be subtracted from mean_values, ie input - mean_values
- --scale_values: All input data will be divided by scales_values. When both mean_values and scale_values are specified, the model optimizer executes (input - mean_values)÷scales_values
- --reverse_input_channels: Convert input channel order from RGB to BGR (and vice versa)
When the above three operations are specified at the same time, the preprocessing sequence is:
Input data→reverse_input_channels→mean_values→scale_values→original model
When converting the model, assuming that the inference program uses the OpenCV library to read the image, you can add three parameters, mean_values, scale_values, and reverse_input_channels, to the model optimizer, and embed the color channel order adjustment and image data normalization operations into the model. If the inference program uses a non-OpenCV library to read images, such as PIL.Image, there is no need to add the --reverse_input_channels parameter.
The following article will take the ResNet model as an example to show the complete process of using the model optimizer to embed preprocessing into the model.
1.1.2 Embedding the preprocessing of the ResNet model into the model
ResNet is not only the champion of the 2015 ILSVRC competition, but also a convolutional neural network model commonly used in industrial practice. PyTorch has integrated ResNet into torchvision, and converted the ResNet model in PyTorch format to ONNX format. The complete code is as follows:
from torchvision.models import resnet50, ResNet50_Weights import torch # https://pytorch.org/vision/stable/models/generated/torchvision.models.resnet50.html weights = ResNet50_Weights.IMAGENET1K_V2 model = resnet50(weights=weights, progress=False).cpu().eval() # define input and output node dummy_input = torch.randn(1, 3, 224, 224, device="cpu") input_names, output_names = ["images"], ['output'] torch.onnx.export(model, dummy_input, "resnet50.onnx", verbose=True, input_names=input_names, output_names=output_names, opset_version=13 )
When exporting the PyTorch format model to ONNX format, it should be noted that Operator version (opset_version) preferably ≥ 11 . in addition, OpenVINO2022.2 supports ONNX 1.8.1 ,Right now opset_version=13 , so this article sets opset_version to 13.
The normalization parameters of the ResNet model trained on the ImageNet 1k dataset are:
- mean_values= [123.675,116.28,103.53]
- scale_values=[58.395,57.12,57.375]
The command to convert the ONNX model to the OpenVINO™ IR model is:
mo -m resnet50.onnx --mean_values=[123.675,116.28,103.53] --scale_values=[58.395,57.12,57.375] --data_type FP16 --reverse_input_channels
After obtaining the IR model of ResNet50, you can use the following program to complete the inference calculation
from openvino.runtime import Core import cv2 import numpy as np core = Core() resnet50 = core.compile_model("resnet50.xml", "CPU") output_node = resnet50.outputs[0] # Resize img = cv2.resize(cv2.imread("cat.jpg"), [224,224]) # Layout: HWC -> NCHW blob = np.expand_dims(np.transpose(img, (2,0,1)), 0) result = resnet50(blob)[output_node] print(np.argmax(result))
In the above reasoning code, operations such as adjusting image size and changing image data layout are still implemented in the reasoning code. Next, this article will introduce the use of OpenVINOTM preprocessing API to embed more preprocessing operations into the model.
1.2 Data preprocessing with OpenVINO™ preprocessing
Starting from OpenVINO™ 2022.1, OpenVINO provides a set of preprocessing API s to embed data preprocessing into the model, refer to " Use OpenVINO™ preprocessing API to further improve YOLOv5 inference performance ". The benefits of embedding data preprocessing into the model are:
- Improve the portability of AI models (reasoning code does not need to consider writing preprocessing programs)
- Improve utilization of inference devices (e.g., Intel® Integrated Graphics/Discrete Graphics)
- Improve the end-to-end performance of AI programs
Use the OpenVINOTM preprocessing API to embed the preprocessing into the complete sample program export_resnet_ov_ppp.py, as follows:
from openvino.preprocess import PrePostProcessor, ColorFormat, ResizeAlgorithm from openvino.runtime import Core, Layout, Type, serialize # ======== Step 0: read original model ========= core = Core() model = core.read_model("resnet50.onnx") # ======== Step 1: Preprocessing ================ ppp = PrePostProcessor(model) # Declare section of desired application's input format ppp.input("images").tensor() \ .set_element_type(Type.u8) \ .set_spatial_dynamic_shape() \ .set_layout(Layout('NHWC')) \ .set_color_format(ColorFormat.BGR) # Specify actual model layout ppp.input("images").model().set_layout(Layout('NCHW')) # Explicit preprocessing steps. Layout conversion will be done automatically as last step ppp.input("images").preprocess() \ .convert_element_type() \ .convert_color(ColorFormat.RGB) \ .resize(ResizeAlgorithm.RESIZE_LINEAR) \ .mean([123.675, 116.28, 103.53]) \ .scale([58.624, 57.12, 57.375]) # Dump preprocessor print(f'Dump preprocessor: {ppp}') model = ppp.build() # ======== Step 2: Save the model with preprocessor================ serialize(model, 'resnet50_ppp.xml', 'resnet50_ppp.bin')
The export_resnet_ov_ppp.py running result is shown in the figure below:
As can be seen from the above code, using the OpenVINOTM preprocessing API, image resizing, color channel conversion, data normalization, and data layout conversion can all be integrated into the model, and the ONNX model can be exported without running the model optimizer as IR model.
The complete reasoning program based on resnet50_ppp.xml is as follows:
from openvino.runtime import Core import cv2 import numpy as np core = Core() resnet50_ppp = core.compile_model("resnet50_ppp.xml", "CPU") output_node = resnet50_ppp.outputs[0] blob = np.expand_dims(cv2.imread("cat.jpg"),0) result = resnet50_ppp(blob)[output_node] print(np.argmax(result))
As shown above, based on the embedded preprocessing IR model, the OpenVINO reasoning program becomes simpler, clearer, and easier to read and understand. Five lines of Python core code implements ResNet model reasoning with embedded preprocessing!
1.3 Using model caching technology to further shorten the first inference delay
exist" Implementing the OpenVINO asynchronous reasoning program of the YOLOv5 model on Viper Canyon "discusses the end-to-end performance of AI applications. For first inference latency, the loading and compilation time of the model can greatly increase the end-to-end runtime of first inference.
The use of model caching technology will greatly shorten the delay of the first inference, as shown in the figure below.
To use model caching technology, you only need to add one line of code: core.set_property({'CACHE_DIR': './cache/ppp'}), the complete sample code is as follows:
from openvino.runtime import Core import cv2 import numpy as np core = Core() core.set_property({'CACHE_DIR': './cache/ppp'}) # Use model caching technology resnet50_ppp = core.compile_model("resnet50_ppp.xml", "CPU") output_node = resnet50_ppp.outputs[0] blob = np.expand_dims(cv2.imread("cat.jpg"),0) result = resnet50_ppp(blob)[output_node] print(np.argmax(result))
When the inference program is run for the second time, the OpenVINOTM runtime will directly load the compiled model from the cache folder, which greatly optimizes the first inference delay.
1.4 Summary
This paper introduces in detail the technology of embedding data preprocessing into AI models through the model optimizer and OpenVINOTM preprocessing API. Embedding data preprocessing into the model simplifies the writing of inference programs, improves the utilization of inference computing equipment, and improves the end-to-end performance of AI programs. Finally, this article also introduces the model caching technology to further optimize the end-to-end first inference latency performance of AI programs.