Radiological Genomic Classification of Brain Tumors Based on PaddleClas

Radiological Genomic Classification of Brain Tumors Based on PaddleClas

1. Project introduction

AI Talent Special Training Camp Phase II

Malignant tumors in the brain are a life-threatening disease. Known as glioblastoma, it is both the most common form of brain cancer in adults and the one with the worst prognosis, with a median survival of less than a year. The presence in tumors of a specific gene sequence known as MGMT promoter methylation has been shown to be a favorable prognostic factor and a strong predictor of response to chemotherapy.

Currently, genetic analysis of cancer requires surgical extraction of tissue samples. It may then take several weeks to determine the genetic characteristics of the tumor. Depending on the results and the type of initial treatment chosen, follow-up surgery may be required. If an accurate method of predicting cancer genetics from imaging alone (i.e., radiogenomics) could be developed, this could minimize the number of surgeries and improve the type of treatment needed. The Radiological Society of North America (RSNA) is partnering with the Society for Medical Image Computing and Computer-Aided Intervention (MICCAI Society) to improve diagnosis and treatment planning for patients with glioblastoma.

2. Dataset

2.1 Dataset Introduction

A total of 7022 images are divided into training set and test set. Divided into four types of glioma - meningioma - no tumor and pituitary

2.2 Decompress the dataset

!unzip /home/aistudio/data/data180229/archive.zip #data set
!unzip /home/aistudio/data/data90342/PaddleClas-release-2.1.zip #PaddleClas

3. Data processing

  • According to the official paddleclas prompt, we need to change the training set image into two txt files
  • According to the classic division method 0.8:0.2
  • train_list.txt
  • val_list.txt
#Import related packages
from sklearn.utils import shuffle
import os
import pandas as pd
import numpy as np
from PIL import Image
import paddle
import paddle.nn as nn
from paddle.io import Dataset
import paddle.vision.transforms as T
import paddle.nn.functional as F
from paddle.metric import Accuracy
import random
#get the original training set
dirpath = "Training"
# First get the total txt and then divide it. Because the verification set needs to be divided, it must be disrupted first, because it is originally ordered
def get_all_txt():
    all_list = []
    i = 0
    for root,dirs,files in os.walk(dirpath): # Represents the root directory, folder, and file respectively
        for file in files:
            i = i + 1 
            if("glioma" in root):
                all_list.append(os.path.join(root,file)+" 0\n")
            if("meningioma" in root):
                all_list.append(os.path.join(root,file)+" 1\n")
            if("notumor" in root):
                all_list.append(os.path.join(root,file)+" 2\n")
            if("pituitary" in root):
                all_list.append(os.path.join(root,file)+" 3\n")
    allstr = ''.join(all_list)
    f = open('all_list.txt','w',encoding='utf-8')
    f.write(allstr)
    return all_list , i

all_list,all_lenth = get_all_txt()
---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

/tmp/ipykernel_199/109634123.py in <module>
     21     return all_list , i
     22 
---> 23 all_list,all_lenth = get_all_txt()


/tmp/ipykernel_199/109634123.py in get_all_txt()
      5     all_list = []
      6     i = 0
----> 7     for root,dirs,files in os.walk(dirpath): # Represents the root directory, folder, and file respectively
      8         for file in files:
      9             i = i + 1


NameError: name 'os' is not defined
#Shuffle the original training set
random.shuffle(all_list)
random.shuffle(all_list)
#Divide training set and validation set
train_size = int(all_lenth * 0.8)
train_list = all_list[:train_size]
val_list = all_list[train_size:]

print(len(train_list))
print(len(val_list))
4569
1143
# Run cell to generate txt 
train_txt = ''.join(train_list)
f_train = open('train_list.txt','w',encoding='utf-8')
f_train.write(train_txt)
f_train.close()
print("train_list.txt Generated successfully!")
train_list.txt Generated successfully!
# Run cell to generate txt
val_txt = ''.join(val_list)
f_val = open('val_list.txt','w',encoding='utf-8')
f_val.write(val_txt)
f_val.close()
print("val_list.txt Generated successfully!")
val_list.txt Generated successfully!
#Generate a list of test set data
test_dirpath = "Testing"
def get_test_txt():
    test_list=[]
    i = 0
    for root,dirs,files in os.walk(test_dirpath):
        for file in files:
            i = i+1
            if("glioma" in root ):
                test_list.append(os.path.join(root,file)+" 0\n")
            if("meningioma" in root):
                test_list.append(os.path.join(root,file)+" 1\n")
            if("notumor" in root ):
                test_list.append(os.path.join(root,file)+" 2\n")
            if("pituitary" in root ):
                test_list.append(os.path.join(root,file)+" 3\n")
    test_str = ''.join(test_list)
    f = open('test_list.txt', 'w', encoding='utf-8')
    f.write(test_str)
    return test_list,i
test_list,test_lenth = get_test_txt()

4. Training

4.1 Move related files

Move the picture to the dataset under paddleclas
As for why it is moving now, it is also a little trick of mine. If it is prevented from moving before, the path of the generated txt is the full path, but a part of the path needs to be removed.

!mv Training/ PaddleClas-release-2.1/dataset/
!mv all_list.txt PaddleClas-release-2.1/dataset/
!mv train_list.txt PaddleClas-release-2.1/dataset/
!mv val_list.txt PaddleClas-release-2.1/dataset/
!mv test_list.txt PaddleClas-release-2.1/dataset/
!mv Testing/ PaddleClas-release-2.1/dataset/
%cd PaddleClas-release-2.1
!ls
/home/aistudio/PaddleClas-release-2.1
configs  docs	      MANIFEST.in    README_cn.md      setup.py
dataset  __init__.py  paddleclas.py  README.md	       tools
deploy	 LICENSE      ppcls	     requirements.txt

4.2 Configure related parameters

/home/aistudio/PaddleClas-release-2.1/configs/ResNet/ResNet50.yaml
mode: 'train'
ARCHITECTURE:
name: 'ResNet50'

pretrained_model: ""
model_save_dir: "./output/"
classes_num: 4
total_images: 1281167
save_interval: 1
validate: True
valid_interval: 1
epochs: 120
topk: 4
image_shape: [3, 512, 512]

use_mix: False
ls_epsilon: -1

LEARNING_RATE:
    function: 'Piecewise'          
    params:                   
        lr: 0.1               
        decay_epochs: [30, 60, 90] 
        gamma: 0.1 

OPTIMIZER:
    function: 'Momentum'
    params:
        momentum: 0.9
    regularizer:
        function: 'L2'
        factor: 0.000100

TRAIN:
    batch_size: 64
    num_workers: 0
    file_list: "/home/aistudio/PaddleClas-release-2.1/dataset/train_list.txt"
    data_dir: "/home/aistudio/PaddleClas-release-2.1/dataset/"
    shuffle_seed: 0
    transforms:
        - DecodeImage:
            to_rgb: True
            channel_first: False
        - RandCropImage:
            size: 224
        - RandFlipImage:
            flip_code: 1
        - NormalizeImage:
            scale: 1./255.
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
        - ToCHWImage:

VALID:
    batch_size: 64
    num_workers: 4
    file_list: "/home/aistudio/PaddleClas-release-2.1/dataset/val_list.txt"
    data_dir: "/home/aistudio/PaddleClas-release-2.1/dataset/"
    shuffle_seed: 0
    transforms:
        - DecodeImage:
            to_rgb: True
            channel_first: False
        - ResizeImage:
            resize_short: 256
        - CropImage:
            size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
        - ToCHWImage:

4.4 Start training

#start training
!python tools/train.py \
    -c /home/aistudio/PaddleClas-release-2.1/configs/ResNet/ResNet50.yaml
2022-11-29 14:46:35 INFO: 
===========================================================
==        PaddleClas is powered by PaddlePaddle !        ==
===========================================================
==                                                       ==
==   For more info please go to the following website.   ==
==                                                       ==
==       https://github.com/PaddlePaddle/PaddleClas      ==
===========================================================

2022-11-29 14:46:35 INFO: ARCHITECTURE : 
2022-11-29 14:46:35 INFO:     name : ResNet50
2022-11-29 14:46:35 INFO: ------------------------------------------------------------
2022-11-29 14:46:35 INFO: LEARNING_RATE : 
2022-11-29 14:46:35 INFO:     function : Piecewise
2022-11-29 14:46:35 INFO:     params : 
2022-11-29 14:46:35 INFO:         decay_epochs : [30, 60, 90]
2022-11-29 14:46:35 INFO:         gamma : 0.1
2022-11-29 14:46:35 INFO:         lr : 0.1
2022-11-29 14:46:35 INFO: ------------------------------------------------------------
2022-11-29 14:46:35 INFO: OPTIMIZER : 
2022-11-29 14:46:35 INFO:     function : Momentum
2022-11-29 14:46:35 INFO:     params : 
2022-11-29 14:46:35 INFO:         momentum : 0.9
2022-11-29 14:46:35 INFO:     regularizer : 
2022-11-29 14:46:35 INFO:         factor : 0.0001
2022-11-29 14:46:35 INFO:         function : L2
2022-11-29 14:46:35 INFO: ------------------------------------------------------------
2022-11-29 14:46:35 INFO: TRAIN : 
2022-11-29 14:46:35 INFO:     batch_size : 64
2022-11-29 14:46:35 INFO:     data_dir : /home/aistudio/PaddleClas-release-2.1/dataset/
2022-11-29 14:46:35 INFO:     file_list : /home/aistudio/PaddleClas-release-2.1/dataset/train_list.txt
2022-11-29 14:46:35 INFO:     num_workers : 0
2022-11-29 14:46:35 INFO:     shuffle_seed : 0
2022-11-29 14:46:35 INFO:     transforms : 
2022-11-29 14:46:35 INFO:         DecodeImage : 
2022-11-29 14:46:35 INFO:             channel_first : False
2022-11-29 14:46:35 INFO:             to_rgb : True
2022-11-29 14:46:35 INFO:         RandCropImage : 
2022-11-29 14:46:35 INFO:             size : 224
2022-11-29 14:46:35 INFO:         RandFlipImage : 
2022-11-29 14:46:35 INFO:             flip_code : 1
2022-11-29 14:46:35 INFO:         NormalizeImage : 
2022-11-29 14:46:35 INFO:             mean : [0.485, 0.456, 0.406]
2022-11-29 14:46:35 INFO:             order : 
2022-11-29 14:46:35 INFO:             scale : 1./255.
2022-11-29 14:46:35 INFO:             std : [0.229, 0.224, 0.225]
2022-11-29 14:46:35 INFO:         ToCHWImage : None
2022-11-29 14:46:35 INFO: ------------------------------------------------------------
2022-11-29 14:46:35 INFO: VALID : 
2022-11-29 14:46:35 INFO:     batch_size : 64
2022-11-29 14:46:35 INFO:     data_dir : /home/aistudio/PaddleClas-release-2.1/dataset/
2022-11-29 14:46:35 INFO:     file_list : /home/aistudio/PaddleClas-release-2.1/dataset/val_list.txt
2022-11-29 14:46:35 INFO:     num_workers : 4
2022-11-29 14:46:35 INFO:     shuffle_seed : 0
2022-11-29 14:46:35 INFO:     transforms : 
2022-11-29 14:46:35 INFO:         DecodeImage : 
2022-11-29 14:46:35 INFO:             channel_first : False
2022-11-29 14:46:35 INFO:             to_rgb : True
2022-11-29 14:46:35 INFO:         ResizeImage : 
2022-11-29 14:46:35 INFO:             resize_short : 256
2022-11-29 14:46:35 INFO:         CropImage : 
2022-11-29 14:46:35 INFO:             size : 224
2022-11-29 14:46:35 INFO:         NormalizeImage : 
2022-11-29 14:46:35 INFO:             mean : [0.485, 0.456, 0.406]
2022-11-29 14:46:35 INFO:             order : 
2022-11-29 14:46:35 INFO:             scale : 1.0/255.0
2022-11-29 14:46:35 INFO:             std : [0.229, 0.224, 0.225]
2022-11-29 14:46:35 INFO:         ToCHWImage : None
2022-11-29 14:46:35 INFO: ------------------------------------------------------------
2022-11-29 14:46:35 INFO: classes_num : 4
2022-11-29 14:46:35 INFO: epochs : 120
2022-11-29 14:46:35 INFO: image_shape : [3, 512, 512]
2022-11-29 14:46:35 INFO: ls_epsilon : -1
2022-11-29 14:46:35 INFO: mode : train
2022-11-29 14:46:35 INFO: model_save_dir : ./output/
2022-11-29 14:46:35 INFO: pretrained_model : 
2022-11-29 14:46:35 INFO: save_interval : 1
2022-11-29 14:46:35 INFO: topk : 4
2022-11-29 14:46:35 INFO: total_images : 1281167
2022-11-29 14:46:35 INFO: use_mix : False
2022-11-29 14:46:35 INFO: valid_interval : 1
2022-11-29 14:46:35 INFO: validate : True
W1129 14:46:35.077515  1228 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W1129 14:46:35.082067  1228 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
2022-11-29 14:46:39 INFO: epoch:0  , train step:0   , top1: 0.28125, top4: 1.00000, loss: 1.58380, lr: 0.100000, batch_cost: 2.82949 s, reader_cost: 0.54911 s, ips: 22.61888 images/sec, eta: 6:41:47
2022-11-29 14:46:42 INFO: epoch:0  , train step:10  , top1: 0.46875, top4: 1.00000, loss: 1.15485, lr: 0.100000, batch_cost: 0.41718 s, reader_cost: 0.23281 s, ips: 153.41036 images/sec, eta: 0:59:10
^C
# #Configure the evaluation file and modify related parameters
# mode: 'valid'
# ARCHITECTURE:
#     name: "ResNet50"

# pretrained_model: "/home/aistudio/PaddleClas-release-2.1/output/ResNet50/best_model/ppcls"
# classes_num: 4
# total_images: 1311
# topk: 4
# image_shape: [3, 512, 512]

# VALID:
#     batch_size: 16
#     num_workers: 0
#     file_list: "/home/aistudio/PaddleClas-release-2.1/dataset/test_list.txt"
#     data_dir: "/home/aistudio/PaddleClas-release-2.1/dataset/"
#     shuffle_seed: 0
#     transforms:
#         - DecodeImage:
#             to_rgb: True
#             channel_first: False
#         - ResizeImage:
#             resize_short: 256
#         - CropImage:
#             size: 224
#         - NormalizeImage:
#             scale: 1.0/255.0
#             mean: [0.485, 0.456, 0.406]
#             std: [0.229, 0.224, 0.225]
#             order: ''
#         - ToCHWImage:


#start assessment
!python tools/eval.py \
    -c /home/aistudio/PaddleClas-release-2.1/configs/eval.yaml

5. Forecast

#predict ct
!python tools/infer/infer.py \
    -i /home/aistudio/PaddleClas-release-2.1/dataset/Testing/glioma \
    --model ResNet50 \
    --pretrained_model "output/ResNet50/best_model/ppcls" \
    --load_static_weights False \
    --class_num=4

Related Information

Mentor: Lin Xu
Student: Liu Yang

Please click here Check out the basic usage of this environment.

Please click here for more detailed instructions.

This article is moved
[Original project link].(https://aistudio.baidu.com/aistudio/projectdetail/5256517?forkThirdPart=1)

Tags: Python

Posted by alanho on Thu, 15 Dec 2022 22:28:46 +1030