Compile and optimize models using TVMC

By super nerve

Contents at a Glance: This section explains how to compile and optimize models using TVMC. TVMC is a command driver for TVM, which executes TVM functions through the command line. This section is the basis for understanding how TVM works.

Key words: TVMC TVM machine learning

This section introduces TVMC, the command-line driver for TVM. TVMC executes TVM functions (including automatic tuning, compilation, analysis, and execution of models) through a command-line interface.

After completing this section, TVMC can be used to implement the following tasks:

* Compile pretrained ResNet-50 v2 model for TVM runtime.

* Use the compiled model to predict real images, and explain the output and model performance.

* Use TVM to tune the model on CPU.

* Recompile the optimized model with the tuning data collected by TVM.

* Predict images with optimized models and compare output and model performance.

This section provides an overview of the functionality of TVM and TVMC, and lays the foundation for understanding how TVM works.

Use TVMC

TVMC is a Python application and part of the TVM Python package. When you install TVM from the Python package, you get a command-line application called tvmc. The location of this command varies by platform and installation method.

Alternatively, if the TVM Python module is available on $PYTHONPATH, the command-line driver functionality can be accessed through the executable Python module (with the python -m tvm.driver.tvmc command).

This tutorial uses tvmc or python -m tvm.driver.tvmc to open the TVMC command line.

Use the following command to view the help page:

tvmc --help
copy

The main functions of TVM available to tvmc come from the subcommands compile, run and tune. Use tvmc--help to view specific options for a given subcommand.

This tutorial will introduce these commands, please download a pre-trained model before starting.

get model

In this tutorial, we will use ResNet-50 v2. ResNet-50 is a 50-layer deep convolutional neural network for classifying images. The model to be used next has been pre-trained on over 1 million images with 1000 different classifications. The input image to this network is of size 224x224.

Downloading Netron (a free ML model viewer) is recommended for a deeper exploration of the organization of the ResNet-50 model.

Download Netron: https://netron.app/

This tutorial uses a model in ONNX format:

wget https://github.com/onnx/models/raw/b9a54e89508f101a1611cd64f4ef56b9cb62c7cf/vision/classification/resnet/model/resnet50-v2-7.onnx
copy

Model formats supported by Tips 1:

TVMC supports models created with Keras, ONNX, TensorFlow, TFLite, and Torch. The model format being used can be specified with the --model-format option. Execute tvmc compile --help for more information.

Tips 2 Add support for ONNX to TVM:

TVM relies on the ONNX Python library available on the system. Use the command pip3 install --user onnx onnxoptimizer to install ONNX. If you have root access and want to install ONNX globally, you can remove the --user option. The onnxoptimizer dependency is optional and only available for onnx>=1.9 .

Compile ONNX model to TVM Runtime

After downloading the ResNet-50 model, compile it with tvmc compile . The output of the compilation is a TAR package of the model (compiled as a dynamic library for the target platform). The model can be run on the target device using the TVM runtime:

# It may take a few minutes, depending on the device
tvmc compile \
--target "llvm" \
--input-shapes "data:[1,3,224,224]" \
--output resnet50-v2-7-tvm.tar \
resnet50-v2-7.onnx
copy

Check out the files created by tvmc compile in the module:

mkdir model
tar -xvf resnet50-v2-7-tvm.tar -C model
ls model
copy

There are three files after decompression:

* mod.so is a model that can be loaded by TVM runtime, expressed as a C++ library.

* mod.json is the textual representation of the TVM Relay computation graph.

* mod.params is a file containing pretrained model parameters.

Modules can be loaded directly by applications, while models can be run through the TVM runtime API.

Tips 3 define the correct TARGET:

Specifying the correct target (option --target ) can greatly improve the performance of compiled modules, since hardware features available on the target can be exploited. See Automatically Tuning Convolutional Networks for x86 CPU s for more information. It is recommended to determine the CPU model used and optional functions, and then set the target appropriately.

Run models from compiled modules using TVMC

After compiling a model into a module, it can be predicted with the TVM runtime. TVMC has a built-in TVM runtime that allows running compiled TVM models.

To run the model and make predictions with TVMC, you need:

* The compiled module just generated.

* Valid inputs for the model to predict.

Models vary in tensor shape, format, and data type. Therefore, most models require pre-processing and post-processing to ensure that the input is valid and the output can be explained. TVMC uses NumPy's .npz format for input and output, which provides good support for serializing multiple arrays into a single file.

The image input in this tutorial uses a cat image, but you can choose other images if you like.

input preprocessing

The input to the ResNet-50 v2 model should be in ImageNet format. Below is an example script for preprocessing images with ResNet-50 v2.

First use pip3 install --user pillow to download the Python image library to satisfy the script's dependency on the image library.

#!python ./preprocess.py
from tvm.contrib.download import download_testdata
from PIL import Image
import numpy as np

img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
img_path = download_testdata(img_url, "imagenet_cat.png", module="data")

# Resize to 224x224
resized_image = Image.open(img_path).resize((224, 224))
img_data = np.asarray(resized_image).astype("float32")

# ONNX expects NCHW input, so convert the array
img_data = np.transpose(img_data, (2, 0, 1))

# Normalize against ImageNet
imagenet_mean = np.array([0.485, 0.456, 0.406])
imagenet_stddev = np.array([0.229, 0.224, 0.225])
norm_img_data = np.zeros(img_data.shape).astype("float32")
for i in range(img_data.shape[0]):
      norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - imagenet_mean[i]) / imagenet_stddev[i]

# Add batch dimension
img_data = np.expand_dims(norm_img_data, axis=0)

# Save as .npz (output imagenet_cat.npz)
np.savez("imagenet_cat", data=img_data)
copy

run compile module

With the model and input data in hand, let's run TVMC to make predictions:

tvmc run \
--inputs imagenet_cat.npz \
--output predictions.npz \
resnet50-v2-7-tvm.tar
copy

The .tar model file includes a C++ library, a description file for the Relay model, and a parameter file for the model. TVMC includes a TVM runtime (which loads the model and makes predictions on the input). Running the above command, TVMC will output a new file predictions.npz containing the model output tensors in NumPy format.

In this example, the same machine was used to compile the model and run it. In some cases, RPC Tracker may be used to run it remotely. See tvmc run --help for more information on these options.

output post-processing

As mentioned earlier, each model provides output tensors differently.

In this example, we need to run some post-processing with the lookup table provided for this model to make the output form of ResNet-50 v2 more readable.

The script below is an example of postprocessing that extracts labels from the output of compiled modules:

#!python ./postprocess.py
import os.path
import numpy as np

from scipy.special import softmax

from tvm.contrib.download import download_testdata

# Download label list
labels_url = "https://s3.amazonaws.com/onnx-model-zoo/synset.txt"
labels_path = download_testdata(labels_url, "synset.txt", module="data")

with open(labels_path, "r") as f:
    labels = [l.rstrip() for l in f]

output_file = "predictions.npz"

# Open and read the output tensor
if os.path.exists(output_file):
    with np.load(output_file) as data:
        scores = softmax(data["output_0"])
        scores = np.squeeze(scores)
        ranks = np.argsort(scores)[::-1]

        for rank in ranks[0:5]:
            print("class='%s' with probability=%f" % (labels[rank], scores[rank]))
copy

The output of running this script is as follows:

python postprocess.py
# class='n02123045 tabby, tabby cat' with probability=0.610553
# class='n02123159 tiger cat' with probability=0.367179
# class='n02124075 Egyptian cat' with probability=0.019365
# class='n02129604 tiger, Panthera tigris' with probability=0.001273
# class='n04040759 radiator' with probability=0.000261
copy

Replace the cat image above with other images and see what predictions the ResNet model makes.

Automatically tune ResNet models

Previous models were compiled to run on the TVM runtime and thus contained no platform-specific optimizations. This section will introduce how to use TVMC to build an optimization model for the working platform.

Inference with compiled modules may sometimes not achieve the expected performance. In this case, the autotuner can be used to better configure the model and thus improve performance. Tuning in TVM refers to optimizing the model on a given target to make it run faster. Unlike training or fine-tuning, it does not affect the accuracy of the model, but only the runtime performance.

As part of the tuning process, TVM implements and runs many variations of different operators to see which performs best. The results of these runs are stored in the tune log file (the final output of the tune command).

Tuning should include at least:

* Platform requirements for target devices to run this model

* The path to the output file where tuning records are stored

* Path to the model to tune.

The following example demonstrates its workflow:

# The default search algorithm requires xgboost, see below for details on tuning the search algorithm
pip install xgboost

tvmc tune \
--target "llvm" \
--output resnet50-v2-7-autotuner_records.json \
resnet50-v2-7.onnx
copy

In this case, you get better results when you specify a more specific target for the --target flag. For example, on an Intel i7 processor, use --target llvm -mcpu=skylake. This tuning example uses LLVM as the compiler for the specified architecture, and performs local tuning on the CPU.

TVMC searches the parameter space of the model, tries different configurations for the operator, and chooses the configuration that runs the fastest on the platform. Although this is a guided search based on CPU and model operations, it still takes several hours to complete the search. The output of the search will be saved to the resnet50-v2-7-autotuner_records.json file, which will later be used to compile the optimized model.

Tips 4 defines the tuning search algorithm:

This search algorithm is guided by the XGBoost Grid algorithm by default. Depending on model complexity and available time, different algorithms can be chosen. See tvmc tune --help for a complete list.

For a consumer-grade Skylake CPU, the output is as follows:

Tuning a session takes a long time, so tvmc tune provides many options to customize the tuning process, including the number of repetitions (such as --repeat and --number), the tuning algorithm to use, etc. See tvmc tune --help for more information.

Compile an optimized model using tuning data

Tuning records can be obtained from the output file `resnet50-v2-7-autotuner_records.json of the above tuning process.

This file can be used to:

* as input for further tuning (via tvmc tune --tuning-records )

* as input to the compiler

Execute the tvmc compile --tuning-records command to let the compiler use this result to generate high-performance code for the model on the specified target. See tvmc compile --help for more information.

After the tuning data of the model is collected, the optimized operator can be used to recompile the model to speed up the calculation.

tvmc compile \
--target "llvm" \
--tuning-records resnet50-v2-7-autotuner_records.json  \
--output resnet50-v2-7-tvm_autotuned.tar \
resnet50-v2-7.onnx
copy

Verify that the optimized model runs and produces the same results:

tvmc run \
--inputs imagenet_cat.npz \
--output predictions.npz \
resnet50-v2-7-tvm_autotuned.tar

python postprocess.py
copy

Verify that the predicted values ​​are the same:

# class='n02123045 tabby, tabby cat' with probability=0.610550
# class='n02123159 tiger cat' with probability=0.367181
# class='n02124075 Egyptian cat' with probability=0.019365
# class='n02129604 tiger, Panthera tigris' with probability=0.001273
# class='n04040759 radiator' with probability=0.000261
copy

Comparing tuned and untuned models

TVMC provides a basic performance evaluation tool between models. The number of repetitions can be specified, and the runtime of the TVMC report model can also be specified (independent of runtime startup). This gives an overview of how much tuning improves model performance.

For example, when tested on an Intel i7 system, the tuned model runs 47% faster than the untuned model:

tvmc run \
--inputs imagenet_cat.npz \
--output predictions.npz  \
--print-time \
--repeat 100 \
resnet50-v2-7-tvm_autotuned.tar

# Execution time summary:
# mean (ms)   max (ms)    min (ms)    std (ms)
#     92.19     115.73       89.85        3.15

tvmc run \
--inputs imagenet_cat.npz \
--output predictions.npz  \
--print-time \
--repeat 100 \
resnet50-v2-7-tvm.tar

# Execution time summary:
# mean (ms)   max (ms)    min (ms)    std (ms)
#    193.32     219.97      185.04        7.11
copy

write at the end

This tutorial introduces TVMC (the command-line driver for TVM), demonstrates how to compile, run, and tune models, and discusses the need for pre- and post-processing of inputs and outputs. After tuning, demonstrate how to compare the performance of the unoptimized and optimized models.

This document shows a simple example using ResNet-50 v2 locally. However, TVMC supports many more features, including cross-compilation, remote execution, and profiling/benchmarking.

Use the tvmc --help command to see other available options.

The next tutorial, Compiling and Optimizing a Model with the Python Interface, will cover the same compilation and optimization steps with the Python interface.

Tags: Python IDE

Posted by chord on Mon, 27 Feb 2023 19:12:28 +1030