#Environment configuration (installation of OpenVINO)

[openVINO+paddle] CPU covid-19 CT image classification and disease segmentation

In this project, I want to try to develop a project when I see the project article of a big guy code generator. Covid-19 is mainly aimed at processing CT images on Cla and Seg (classification and segmentation) models on the flying paddles, then downloading the onnx models to their own devices and transforming them into IR models through openVINO, and then can manage the new crown pneumonia CT pictures on CPU.

Here I will provide all the data and codes that have been run through. I have uploaded all my source code and relevant data to Baidu aistudio. You can search directly in the following link:
https://aistudio.baidu.com/aistudio/projectdetail/3460633

Considering the simplicity of OpenVINO, this article first shows the effect of CPU OpenVINO, and then shows how to model and export on the propeller.
Let's first look at the rendering of OpenVINO:

OpenVINO reasoning

The IR model is directly provided here. The following will teach you how to export ONNX model and convert it into IR model in propeller training. You can download it at this link Linked model
You just need to download it and put it on your Jupiter notebook.
First of all, you need to introduce the required libraries. This involves the installation of OpenVINO's notebook. For this, please refer to my other blog to view OpenVINO's notebook and environment configuration. You can look at this code first. It's not too difficult.

First, import the required library

import os
import sys
import zipfile
from pathlib import Path

from openvino.inference_engine import IECore

sys.path.append("../utils")
from models.custom_segmentation import SegmentationModel
from notebook_utils import benchmark_model, download_file, show_live_inference

Here, if you have downloaded the model in the link, you can configure the path of the model in double quotation marks. IR_ Set the path to "pre trained_model / unet44. XML".

MODEL_PATH = "pretrained_model/quantized_unet_kits19.xml"

We need to call the required hardware. We can use not only CPU, but also GPU.

ie = IECore()
device = "MULTI:CPU,GPU" if "GPU" in ie.available_devices else "CPU"

In order to measure the reasoning performance of the model, the Benchmark Tool of OpenVINO is used here. You can use the command directly on the note book! benchmark_app or %sx benchmark_app to start. Here we directly use the wrapper function in Notebook Utils.

benchmark_model(model_path=MODEL_PATH, device=device, seconds=15)

Downloading and preparing data
The dataset here directly provides a link to download our dataset. Note that this is not training, but simply download a small training set, because if there is no training set in the path, you can download it directly.

BASEDIR = Path("kits19_frames_1")
CASE = 117

case_path = BASEDIR / f"case_{CASE:05d}"

if not case_path.exists():
    filename = download_file(
        f"https://storage.openvinotoolkit.org/data/test_data/openvino_notebooks/kits19/case_{CASE:05d}.zip"
    )
    with zipfile.ZipFile(filename, "r") as zip_ref:
        zip_ref.extractall(path=BASEDIR)
    os.remove(filename)  # remove zipfile
    print(f"Downloaded and extracted data for case_{CASE:05d}")
else:
    print(f"Data for case_{CASE:05d} exists")

Show life reasoning
In order to display real-time reasoning on notebooks, we use the asynchronous processing characteristics of openvino reasoning engine. (there are many ways of reasoning. Asynchronous synchronization can be seen in the openinfo course of my blog)
We use show in Notebook Utils_ live_ Information function to display the parameters of real-time reasoning. This function uses Open Model Zoo's AsyncPipeline and model APIs to perform asynchronous reasoning. When the reasoning of the specified CT scan is completed, print the total time and throughput (fps) including preprocessing and display on the result graph.

ie = IECore()
segmentation_model = SegmentationModel(ie=ie, model_path=Path(MODEL_PATH), sigmoid=True)
image_paths = sorted(case_path.glob("imaging_frames/*jpg"))

print(f"{case_path.name}, {len(image_paths)} images")

Reasoning
Here we run show live_ Information function, which loads the image segmentation to the specified device, loads the image, performs reasoning, and displays the results on the frame loaded in the image in real time.

device = "MULTI:CPU,GPU" if "GPU" in ie.available_devices else "CPU"
show_live_inference(
    ie=ie, image_paths=image_paths, model=segmentation_model, device=device
)

If there are some places that won't, you can see some big projects and technical documents referred to during this period.

  1. https://aistudio.baidu.com/aistudio/projectdetail/3459413
  2. https://aistudio.baidu.com/aistudio/projectdetail/3460443?forkThirdPart=1
  3. https://aistudio.baidu.com/aistudio/projectdetail/3460337?contributionType=1
  4. https://aistudio.baidu.com/aistudio/projectdetail/3461846
  5. https://aistudio.baidu.com/aistudio/projectdetail/3460268?forkThirdPart=1
  6. https://aistudio.baidu.com/aistudio/projectdetail/3460317?contributionType=1

Data set acquisition

Covid-19 is a picture of the lungs. The image below is a lung CT map of the new crown pneumonia patients. The second picture shows a distinct difference between the CT pictures of the normal people and the lower ones.


The dataset used is covid19 radio database. The following is the link of data. Covid-19 radiograph database combines CT scans in Italy, Ieee8023 and more than 40 papers to form a data set with 219 new coronal cases, 1341 normal scans and 1345 pneumonia scans. Click to open the data set for direct reference.
https://aistudio.baidu.com/aistudio/datasetdetail/34241

Medical disclaimer of the data: 97% are only the results on the experimental data set. Any clinically used algorithm needs to be tested in the actual use environment. The classification results of this model can not be used as the basis of clinical diagnosis and treatment.

The data set used here is covid19 CT scans. The data set contains 20 sets of new coronal scans collected by Ieee8023, which are labeled with left and right lungs and infected areas. The following is an example of a label,

The following is a link to the data. You only need to download it directly or directly reference it to your own aistudio.
https://aistudio.baidu.com/aistudio/datasetdetail/34221
Medical disclaimer of data: any clinically used algorithm needs to be tested in the actual use environment, and the results of this model can not be used as the basis of clinical diagnosis and treatment.

Training and derivation of paddlecla new crown CT classification model

1. Data set preprocessing

Run the following code to decompress the data set. It must be noted that after decompressing, you will get three txt files in the same directory as the image we created in the data directory. Because our project is to decompress two data sets, we need to delete the three txt files after the first model training. If you are using aistudio, you can check the ~ / data/images directory. There are three folders, COVID-19,Viral Pneumonia and NORMAL, which store images of three categories respectively.

!mkdir /home/aistudio/data/images
!unzip -q /home/aistudio/data/data34241/covid19-combo.zip -d /home/aistudio/data/images 
!mv /home/aistudio/data/images/'COVID-19 Radiography Database'/* /home/aistudio/data/images
!rm -rf /home/aistudio/data/images/'COVID-19 Radiography Database'
!ls ~/data/images

PaddleClas also needs to provide a data list file, in which each data is marked according to the format of "file path category" for subsequent training. At the same time, you need to group the data. You need data sets for training, evaluation and testing. The code is as follows

%cd ~
import os
base_dir = "/home/aistudio/data/images/" # Path of CT image
img_dirs = ["COVID-19", "NORMAL", "Viral Pneumonia"] # Category III CT image folder name

file_names = ["train_list.txt", "val_list.txt", "test_list.txt"]
splits = [0, 0.6, 0.8, 1] # Group the data according to the proportion of 6 2

for split_ind, file_name in enumerate(file_names):
    with open(os.path.join("./data", file_name), "w") as f:
        for type_ind, img_dir in enumerate(img_dirs):
            imgs = os.listdir(os.path.join(base_dir, img_dir) )
            for ind in range( int(splits[split_ind]* len(imgs)), int(splits[split_ind + 1] * len(imgs)) ):
                print("{}|{}".format(img_dir + "/" + imgs[ind], type_ind), file = f)

After making the file list, you can use head to check the first 10 lines.

! head /home/aistudio/data/train_list.txt

2. paddleclas configuration

I have uploaded a compressed package of the propeller to the platform. You can directly click on my project to download and decompress it. This is actually the code package git downloaded from the propeller, but it is cut to less than 150M, which is convenient for transmission. After decompression, you need to run the second line of code to initialize the environment. In fact, you can open it and have a look. There are some dependencies that need to be installed.

!unzip -q pdclas.zip
%cd pdclas
!pip install -r requirements.txt

Next, I want to use GPU for model training, so I need to initialize the environment before training

!python -m pip install paddlepaddle-gpu==2.1.3.post101 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html

3. Model training

Now comes the key step of model training. To use the PaddleClas training model, you need to write a config file, which defines the details of the training steps, such as how many epochs and batch size. Here you can modify this covid-2 according to your hardware requirements Yaml configuration file. If the file is missing, you can also copy the following code. However, it must be noted that the file I provide is batch in my file because I use a 32G memory graphics card_ Size: it's 16. At the same time, in order to obtain models with an accuracy of more than 95%, my configuration file is that epoch is 15 steps and the training time is about half an hour.

mode: 'train'
ARCHITECTURE:
    # For the model structure used, you can modify the model name by referring to the cofig file of other model structures under pdclas/config
    # For example, ResNet101
    name: 'ResNet50_vd'
pretrained_model: "" # Generally, the use of pre training model migration can achieve good results on small data sets, but the pre training models are aimed at natural images, so they are not used
model_save_dir: "./output/"
classes_num: 3
total_images: 2905
save_interval: 1
validate: True
valid_interval: 1
epochs: 20 
topk: 2
image_shape: [3, 1024, 1024]


LEARNING_RATE:
    function: 'Cosine'    
    params:                   
        lr: 0.00375

OPTIMIZER:
    function: 'Momentum'
    params:
        momentum: 0.9
    regularizer:
        function: 'L2'
        factor: 0.000001

TRAIN:
    batch_size: 4 # The size of a batch in the training process. If you are lucky enough to get 32g graphics card, the maximum parameter is 16
    num_workers: 4
    file_list: "/home/aistudio/data/train_list.txt"
    data_dir: "/home/aistudio/data/images/"
    delimiter: "|"
    shuffle_seed: 0
    transforms:
        - DecodeImage:
            to_rgb: True
            to_np: False
            channel_first: False
        - RandFlipImage:
            flip_code: 1
        - NormalizeImage:
            scale: 1./255.
        - ToCHWImage:

VALID:
    batch_size: 20
    num_workers: 4
    file_list: "/home/aistudio/data/val_list.txt"
    data_dir: "/home/aistudio/data/images/"
    delimiter: "|"
    shuffle_seed: 0
    transforms:
        - DecodeImage:
            to_rgb: True
            to_np: False
            channel_first: False
        - ResizeImage:
            resize_short: 1024
        - NormalizeImage:
            scale: 1.0/255.0
        - ToCHWImage:

Next, start the training. Generally, pdclas is used in the command-line environment. Note that an environment variable needs to be set before starting the training. The code is as follows. It is worth noting that if an error is reported, it is about this pad enable_ For static (), you need to open the file along the path, such as train Py and export Py (export there), add the following two pieces of code at the top.

import paddle
paddle.enable_static()
%cd ~/pdclas/
import os 
os.environ['PYTHONPATH']="/home/aistudio/pdclas"
!python -m paddle.distributed.launch --selected_gpus="0" tools/train.py -c ../covid-2.yaml 

4. Model export

Training, this is the storage path specified in advance. If you don't change it, you can run the following code to view the good model.

!ls ~/pdclas/output/ResNet50_vd

Through the model transformation script provided in pdclas, the training model is transformed into reasoning model. You can see that two files are generated after the transformation. Model is the model structure and params is the model weight. It should be mentioned here that the weight files saved by the Paddle framework are divided into two types: the training model supporting forward reasoning and reverse gradient and the reasoning model only supporting forward reasoning. The difference between the two is that the reasoning model optimizes the reasoning speed and video memory, cuts some tensor s that are only needed in the training process, reduces the occupation of video memory, and optimizes the speed of similar layer fusion and kernel selection. The model saved in the training process of ppcls belongs to the training model. In this process, we generally use the reasoning model, which is more convenient to export the onnx model. Second, we should also consider that the volume of the reasoning model is relatively small and convenient for transmission.

!python tools/export_model.py --m=ResNet50_vd --p=output/ResNet50_vd/best_model_in_epoch_0/ppcls --o=../inference
!ls -lh /home/aistudio/inference/

Here, I'll go back to the original way to show that now my folder is all the files of this project on the side of the propeller cd ~ is to go back to the root directory. Reasoning will be carried out.

%cd ~/
!ls

You can use covid-19 to model the images of any dataset, and compare the scripts generated when the training files are generated. The new crowns are 0, the normal category is 1, and other pneumonia categories are 2.

!python /home/aistudio/pdclas/tools/infer/predict.py --use_gpu=0 -i="/home/aistudio/COVID-19 (10).png"     -m=/home/aistudio/inference/model     -p=/home/aistudio/inference/params 

Since my environment is installed in the paddleSeg part, I will save it for you to export the segmentation model, and then import it back, so that there will be no error.

!paddle2onnx \
    --model_dir inference/ \
    --model_filename model \
    --params_filename params \
    --save_file model_1.onnx \
    --opset_version 12

CT segmentation and onnx export of new crown

1. Data set preprocessing

Install nibabel library to read data in nii format

!pip install --upgrade nibabel -i https://mirror.baidu.com/pypi/simple

Decompress the dataset

%cd ~/data/data34221/
!unzip -q  -d .. 20_ncov_scan.zip    # Scan data
!unzip -q -d ../Infection_Mask Infection_Mask.zip  # Infection focus segmentation label
!unzip -qd ../Lung_Mask Lung_Mask.zip  # Left and right lung segmentation label
# !unzip -qd ../Lung_Infection Lung_Infection.zip   # The label of combined lung and infection focus is not used in the project
!ls ~/data/20_ncov_scan

As you can see above, 20 sets of scans have been decompressed. PaddleSeg framework only accepts the input of picture format, so we need to preprocess the CT scan in nii format and convert them into pictures. In addition, in this process, we clip the scanned data to the range of [- 512, 512], so as to prevent the noise with too large or too small intensity from affecting the training.

import os 
import nibabel as nib
import numpy as np 
from tqdm import tqdm
import cv2

def listdir(path):
    dirs = os.listdir(path)
    dirs.sort()  # The file names of scanning and label are not exactly the same. Sorting all files in the two directories can ensure that they can match
    return dirs

scan_dir = "/home/aistudio/data/20_ncov_scan" # CT scan data path
label_dir = "/home/aistudio/data/Infection_Mask" # Path of lesion segmentation label
output_dir = "/home/aistudio/data/prep" 
scan_output = os.path.join(output_dir, "image") # CT picture output path
label_output = os.path.join(output_dir, "annotation") # Label picture output path

if not os.path.exists(scan_output):
    os.makedirs(scan_output)
if not os.path.exists(label_output):
    os.makedirs(label_output)

wl, wh = (-512, 512) # Intensity range of windowing CT

scan_fnames = listdir(scan_dir)
label_fnames = listdir(label_dir)

for case_ind in tqdm( range(len(scan_fnames)) ):
    scan_fname = scan_fnames[case_ind]
    label_fname = label_fnames[case_ind]

    scanf = nib.load(os.path.join(scan_dir, scan_fname)) # Reading data using nibabel Library
    scan = scanf.get_fdata()
    labelf = nib.load(os.path.join(label_dir, label_fname))
    label = labelf.get_fdata()

    scan = np.rot90(scan) # Correct the direction of the read data and rotate it 90 degrees counterclockwise
    label = np.rot90(label)

    # Windowing operation to convert the range to 0 ~ 255, which is convenient for storing pictures
    scan = scan.clip(wl, wh).astype("float16")
    scan = ( (scan - wl)/(wh - wl) * 256) 

    for sli_ind in range(label.shape[2]):
        scan_slice_path = os.path.join(scan_output, "{}-{}.png".format(scan_fname.rstrip(".nii.gz"), sli_ind ) )
        label_slice_path = os.path.join(label_output, "{}-{}.png".format(scan_fname.rstrip(".nii.gz"), sli_ind ) )
        cv2.imwrite(scan_slice_path, scan[:,:,sli_ind])
        cv2.imwrite(label_slice_path, label[:,:,sli_ind])
print("Picture conversion complete")

! ls ~/data/prep/image -l | wc -l # You can see that more than 3500 pictures have been generated

It must be noted here that since our project decompresses two data sets and the files to be generated are in the same path, you need to delete the three data files before performing the next operation. After data preprocessing, PaddleSeg needs us to provide a file list TXT for training set, verification set and test set respectively. The actual function of the following code is to write the paths of all training data into three txt files according to the division proportion of three sets.

import os 
data_base_dir = "/home/aistudio/data/prep"
scan_folder = "image"
label_folder = "annotation"
txt_path = "/home/aistudio/data/"

split = [0, 0.7, 0.9, 1.0] # The division ratio of training, verification and test sets is 7:2:1
list_names = ["train_list.txt", "val_list.txt", "test_list.txt"]
curr_type = 0

img_count = len(os.listdir( os.path.join(data_base_dir, scan_folder ) ) )
split = [int(x * img_count) for x in split]

f = open(os.path.join(txt_path, list_names[curr_type]), "w")
for ind, slice_name in enumerate(os.listdir( os.path.join(data_base_dir, scan_folder)) ):
    if ind < img_count - 1 and ind == split[curr_type + 1]:
        curr_type += 1
        f.close()
        f = open(os.path.join(txt_path, list_names[curr_type]), "w")
    print("{}|{}".format(os.path.join(scan_folder, slice_name), os.path.join(label_folder, slice_name)), file=f)
f.close()
# You can use the head command to see the generated results
!head ~/data/train_list.txt

2. paddleSeg configuration

Here, unzip the provided code file of propeller segmentation. In fact, you can also download it through git. This file has been uploaded to the project

%cd /home/aistudio
!unzip paddleSeg.zip

This step is the same as the previous step. If you have initialized the GPU environment, you can no longer perform it.

!python -m pip install paddlepaddle-gpu==2.1.3.post101 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html

This needs to be executed to initialize the split environment

!pip install -r paddleSeg/requirements.txt

3. Model training

If the error is reported after running the code, a paddle appears enable_ For the word static (), you need to open the training file trai Py add the following code.

import paddle
paddle.enable_static()

Here is a file, and I uploaded my own configuration file to the project. Because I need a high-precision model and I use a 32G graphics card, my parameters will be different from the following. epoch to 18, BATCH_SIZE to 20

# Dataset configuration
DATASET:
    DATA_DIR: "/home/aistudio/data/prep" # Data base path, which join s with the path in the file list into the actual file path
    NUM_CLASSES: 2  # Segmentation is divided into focus and non focus
    TRAIN_FILE_LIST: "/home/aistudio/data/train_list.txt" # File list path for training, validation and test sets
    VAL_FILE_LIST: "/home/aistudio/data/val_list.txt"
    TEST_FILE_LIST: "/home/aistudio/data/test_list.txt"
    SEPARATOR: "|" # Split the training data and label path with | in the file list
    IMAGE_TYPE: "gray" # Use gray image and single channel for training

# Pre training model configuration
MODEL:
    MODEL_NAME: "unet" # Use the unet network structure. The optional network structures include deep labv3p, unet, icnet, pspnet and hrnet
    DEFAULT_NORM_TYPE: "bn"

# Other configurations
TRAIN_CROP_SIZE: (512, 512) # Training input data size
EVAL_CROP_SIZE: (512, 512)
AUG:
    AUG_METHOD: "unpadding"
    FIX_RESIZE_SIZE: (512, 512)
    MIRROR: True # Left and right mirror data enhancement

BATCH_SIZE: 8 # If you are lucky enough to get 32g graphics card, this parameter can be opened to about 20 at most
TRAIN:
    MODEL_SAVE_DIR: "./saved_model/unet_covid/"
    SNAPSHOT_EPOCH: 1
TEST:
    TEST_MODEL: "./saved_model/unet_covid/final"
SOLVER:
    NUM_EPOCHS: 20 # The training takes a long time. In order to facilitate the execution of the following code, only one epoch is written here. About 85% accuracy can be achieved in 15 ~ 20 epochs
    LR: 0.001
    LR_POLICY: "poly"
    OPTIMIZER: "adam"

Start training

%cd ~/paddleSeg
!python pdseg/train.py --cfg ~/covid.yaml --use_gpu --use_mpio --do_eval --use_vdl --vdl_log_dir ~/log

4. Model export

Export the code. If there is still an error, open the export file and add the above two codes

!python pdseg/export_model.py --cfg ~/covid.yaml TEST.TEST_MODEL ./saved_model/unet_covid/final/

Install paddle2onnx and their related tools

!pip install pycocotools paddle2onnx
!pip install onnx==1.9.0

Download the code file of paddle2onnx from GitHub

%cd ~/
!git clone https://github.com/paddlepaddle/paddle2onnx --depth 1

Initialize environment

!cd ~/paddle2onnx/ && python setup.py install
%cd ~/paddleSeg/
!ls

Here we export the onnx model of our split model. You can specify the export path and then download it to your computer.

!paddle2onnx \
    --model_dir freeze_model \
    --model_filename __model__ \
    --params_filename __params__ \
    --save_file model.onnx \
    --opset_version 12

Of course, although our main task is to export the ONNX model, you can also infer this model. You can get the segmented image of the patient's lung

!python infer.py --conf=/home/aistudio/infer.yaml --input_dir=/home/aistudio/inference --image_dir="/home/aistudio/"

Tags: OpenVINO paddle

Posted by SoulAssassin on Sat, 05 Mar 2022 12:07:32 +1030