Radiological Genomic Classification of Brain Tumors Based on PaddleClas
1. Project introduction
AI Talent Special Training Camp Phase II
Malignant tumors in the brain are a life-threatening disease. Known as glioblastoma, it is both the most common form of brain cancer in adults and the one with the worst prognosis, with a median survival of less than a year. The presence in tumors of a specific gene sequence known as MGMT promoter methylation has been shown to be a favorable prognostic factor and a strong predictor of response to chemotherapy.
Currently, genetic analysis of cancer requires surgical extraction of tissue samples. It may then take several weeks to determine the genetic characteristics of the tumor. Depending on the results and the type of initial treatment chosen, follow-up surgery may be required. If an accurate method of predicting cancer genetics from imaging alone (i.e., radiogenomics) could be developed, this could minimize the number of surgeries and improve the type of treatment needed. The Radiological Society of North America (RSNA) is partnering with the Society for Medical Image Computing and Computer-Aided Intervention (MICCAI Society) to improve diagnosis and treatment planning for patients with glioblastoma.
2. Dataset
2.1 Dataset Introduction
A total of 7022 images are divided into training set and test set. Divided into four types of glioma - meningioma - no tumor and pituitary
2.2 Decompress the dataset
!unzip /home/aistudio/data/data180229/archive.zip #data set !unzip /home/aistudio/data/data90342/PaddleClas-release-2.1.zip #PaddleClas
3. Data processing
- According to the official paddleclas prompt, we need to change the training set image into two txt files
- According to the classic division method 0.8:0.2
- train_list.txt
- val_list.txt
#Import related packages from sklearn.utils import shuffle import os import pandas as pd import numpy as np from PIL import Image import paddle import paddle.nn as nn from paddle.io import Dataset import paddle.vision.transforms as T import paddle.nn.functional as F from paddle.metric import Accuracy import random
#get the original training set dirpath = "Training" # First get the total txt and then divide it. Because the verification set needs to be divided, it must be disrupted first, because it is originally ordered def get_all_txt(): all_list = [] i = 0 for root,dirs,files in os.walk(dirpath): # Represents the root directory, folder, and file respectively for file in files: i = i + 1 if("glioma" in root): all_list.append(os.path.join(root,file)+" 0\n") if("meningioma" in root): all_list.append(os.path.join(root,file)+" 1\n") if("notumor" in root): all_list.append(os.path.join(root,file)+" 2\n") if("pituitary" in root): all_list.append(os.path.join(root,file)+" 3\n") allstr = ''.join(all_list) f = open('all_list.txt','w',encoding='utf-8') f.write(allstr) return all_list , i all_list,all_lenth = get_all_txt()
--------------------------------------------------------------------------- NameError Traceback (most recent call last) /tmp/ipykernel_199/109634123.py in <module> 21 return all_list , i 22 ---> 23 all_list,all_lenth = get_all_txt() /tmp/ipykernel_199/109634123.py in get_all_txt() 5 all_list = [] 6 i = 0 ----> 7 for root,dirs,files in os.walk(dirpath): # Represents the root directory, folder, and file respectively 8 for file in files: 9 i = i + 1 NameError: name 'os' is not defined
#Shuffle the original training set random.shuffle(all_list) random.shuffle(all_list)
#Divide training set and validation set train_size = int(all_lenth * 0.8) train_list = all_list[:train_size] val_list = all_list[train_size:] print(len(train_list)) print(len(val_list))
4569 1143
# Run cell to generate txt train_txt = ''.join(train_list) f_train = open('train_list.txt','w',encoding='utf-8') f_train.write(train_txt) f_train.close() print("train_list.txt Generated successfully!")
train_list.txt Generated successfully!
# Run cell to generate txt val_txt = ''.join(val_list) f_val = open('val_list.txt','w',encoding='utf-8') f_val.write(val_txt) f_val.close() print("val_list.txt Generated successfully!")
val_list.txt Generated successfully!
#Generate a list of test set data test_dirpath = "Testing" def get_test_txt(): test_list=[] i = 0 for root,dirs,files in os.walk(test_dirpath): for file in files: i = i+1 if("glioma" in root ): test_list.append(os.path.join(root,file)+" 0\n") if("meningioma" in root): test_list.append(os.path.join(root,file)+" 1\n") if("notumor" in root ): test_list.append(os.path.join(root,file)+" 2\n") if("pituitary" in root ): test_list.append(os.path.join(root,file)+" 3\n") test_str = ''.join(test_list) f = open('test_list.txt', 'w', encoding='utf-8') f.write(test_str) return test_list,i test_list,test_lenth = get_test_txt()
4. Training
4.1 Move related files
Move the picture to the dataset under paddleclas
As for why it is moving now, it is also a little trick of mine. If it is prevented from moving before, the path of the generated txt is the full path, but a part of the path needs to be removed.
!mv Training/ PaddleClas-release-2.1/dataset/ !mv all_list.txt PaddleClas-release-2.1/dataset/ !mv train_list.txt PaddleClas-release-2.1/dataset/ !mv val_list.txt PaddleClas-release-2.1/dataset/ !mv test_list.txt PaddleClas-release-2.1/dataset/ !mv Testing/ PaddleClas-release-2.1/dataset/
%cd PaddleClas-release-2.1 !ls
/home/aistudio/PaddleClas-release-2.1 configs docs MANIFEST.in README_cn.md setup.py dataset __init__.py paddleclas.py README.md tools deploy LICENSE ppcls requirements.txt
4.2 Configure related parameters
/home/aistudio/PaddleClas-release-2.1/configs/ResNet/ResNet50.yaml
mode: 'train'
ARCHITECTURE:
name: 'ResNet50'
pretrained_model: "" model_save_dir: "./output/" classes_num: 4 total_images: 1281167 save_interval: 1 validate: True valid_interval: 1 epochs: 120 topk: 4 image_shape: [3, 512, 512] use_mix: False ls_epsilon: -1 LEARNING_RATE: function: 'Piecewise' params: lr: 0.1 decay_epochs: [30, 60, 90] gamma: 0.1 OPTIMIZER: function: 'Momentum' params: momentum: 0.9 regularizer: function: 'L2' factor: 0.000100 TRAIN: batch_size: 64 num_workers: 0 file_list: "/home/aistudio/PaddleClas-release-2.1/dataset/train_list.txt" data_dir: "/home/aistudio/PaddleClas-release-2.1/dataset/" shuffle_seed: 0 transforms: - DecodeImage: to_rgb: True channel_first: False - RandCropImage: size: 224 - RandFlipImage: flip_code: 1 - NormalizeImage: scale: 1./255. mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] order: '' - ToCHWImage: VALID: batch_size: 64 num_workers: 4 file_list: "/home/aistudio/PaddleClas-release-2.1/dataset/val_list.txt" data_dir: "/home/aistudio/PaddleClas-release-2.1/dataset/" shuffle_seed: 0 transforms: - DecodeImage: to_rgb: True channel_first: False - ResizeImage: resize_short: 256 - CropImage: size: 224 - NormalizeImage: scale: 1.0/255.0 mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] order: '' - ToCHWImage:
4.4 Start training
#start training !python tools/train.py \ -c /home/aistudio/PaddleClas-release-2.1/configs/ResNet/ResNet50.yaml
2022-11-29 14:46:35 INFO: =========================================================== == PaddleClas is powered by PaddlePaddle ! == =========================================================== == == == For more info please go to the following website. == == == == https://github.com/PaddlePaddle/PaddleClas == =========================================================== 2022-11-29 14:46:35 INFO: ARCHITECTURE : 2022-11-29 14:46:35 INFO: name : ResNet50 2022-11-29 14:46:35 INFO: ------------------------------------------------------------ 2022-11-29 14:46:35 INFO: LEARNING_RATE : 2022-11-29 14:46:35 INFO: function : Piecewise 2022-11-29 14:46:35 INFO: params : 2022-11-29 14:46:35 INFO: decay_epochs : [30, 60, 90] 2022-11-29 14:46:35 INFO: gamma : 0.1 2022-11-29 14:46:35 INFO: lr : 0.1 2022-11-29 14:46:35 INFO: ------------------------------------------------------------ 2022-11-29 14:46:35 INFO: OPTIMIZER : 2022-11-29 14:46:35 INFO: function : Momentum 2022-11-29 14:46:35 INFO: params : 2022-11-29 14:46:35 INFO: momentum : 0.9 2022-11-29 14:46:35 INFO: regularizer : 2022-11-29 14:46:35 INFO: factor : 0.0001 2022-11-29 14:46:35 INFO: function : L2 2022-11-29 14:46:35 INFO: ------------------------------------------------------------ 2022-11-29 14:46:35 INFO: TRAIN : 2022-11-29 14:46:35 INFO: batch_size : 64 2022-11-29 14:46:35 INFO: data_dir : /home/aistudio/PaddleClas-release-2.1/dataset/ 2022-11-29 14:46:35 INFO: file_list : /home/aistudio/PaddleClas-release-2.1/dataset/train_list.txt 2022-11-29 14:46:35 INFO: num_workers : 0 2022-11-29 14:46:35 INFO: shuffle_seed : 0 2022-11-29 14:46:35 INFO: transforms : 2022-11-29 14:46:35 INFO: DecodeImage : 2022-11-29 14:46:35 INFO: channel_first : False 2022-11-29 14:46:35 INFO: to_rgb : True 2022-11-29 14:46:35 INFO: RandCropImage : 2022-11-29 14:46:35 INFO: size : 224 2022-11-29 14:46:35 INFO: RandFlipImage : 2022-11-29 14:46:35 INFO: flip_code : 1 2022-11-29 14:46:35 INFO: NormalizeImage : 2022-11-29 14:46:35 INFO: mean : [0.485, 0.456, 0.406] 2022-11-29 14:46:35 INFO: order : 2022-11-29 14:46:35 INFO: scale : 1./255. 2022-11-29 14:46:35 INFO: std : [0.229, 0.224, 0.225] 2022-11-29 14:46:35 INFO: ToCHWImage : None 2022-11-29 14:46:35 INFO: ------------------------------------------------------------ 2022-11-29 14:46:35 INFO: VALID : 2022-11-29 14:46:35 INFO: batch_size : 64 2022-11-29 14:46:35 INFO: data_dir : /home/aistudio/PaddleClas-release-2.1/dataset/ 2022-11-29 14:46:35 INFO: file_list : /home/aistudio/PaddleClas-release-2.1/dataset/val_list.txt 2022-11-29 14:46:35 INFO: num_workers : 4 2022-11-29 14:46:35 INFO: shuffle_seed : 0 2022-11-29 14:46:35 INFO: transforms : 2022-11-29 14:46:35 INFO: DecodeImage : 2022-11-29 14:46:35 INFO: channel_first : False 2022-11-29 14:46:35 INFO: to_rgb : True 2022-11-29 14:46:35 INFO: ResizeImage : 2022-11-29 14:46:35 INFO: resize_short : 256 2022-11-29 14:46:35 INFO: CropImage : 2022-11-29 14:46:35 INFO: size : 224 2022-11-29 14:46:35 INFO: NormalizeImage : 2022-11-29 14:46:35 INFO: mean : [0.485, 0.456, 0.406] 2022-11-29 14:46:35 INFO: order : 2022-11-29 14:46:35 INFO: scale : 1.0/255.0 2022-11-29 14:46:35 INFO: std : [0.229, 0.224, 0.225] 2022-11-29 14:46:35 INFO: ToCHWImage : None 2022-11-29 14:46:35 INFO: ------------------------------------------------------------ 2022-11-29 14:46:35 INFO: classes_num : 4 2022-11-29 14:46:35 INFO: epochs : 120 2022-11-29 14:46:35 INFO: image_shape : [3, 512, 512] 2022-11-29 14:46:35 INFO: ls_epsilon : -1 2022-11-29 14:46:35 INFO: mode : train 2022-11-29 14:46:35 INFO: model_save_dir : ./output/ 2022-11-29 14:46:35 INFO: pretrained_model : 2022-11-29 14:46:35 INFO: save_interval : 1 2022-11-29 14:46:35 INFO: topk : 4 2022-11-29 14:46:35 INFO: total_images : 1281167 2022-11-29 14:46:35 INFO: use_mix : False 2022-11-29 14:46:35 INFO: valid_interval : 1 2022-11-29 14:46:35 INFO: validate : True W1129 14:46:35.077515 1228 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2 W1129 14:46:35.082067 1228 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2. 2022-11-29 14:46:39 INFO: epoch:0 , train step:0 , top1: 0.28125, top4: 1.00000, loss: 1.58380, lr: 0.100000, batch_cost: 2.82949 s, reader_cost: 0.54911 s, ips: 22.61888 images/sec, eta: 6:41:47 2022-11-29 14:46:42 INFO: epoch:0 , train step:10 , top1: 0.46875, top4: 1.00000, loss: 1.15485, lr: 0.100000, batch_cost: 0.41718 s, reader_cost: 0.23281 s, ips: 153.41036 images/sec, eta: 0:59:10 ^C
# #Configure the evaluation file and modify related parameters # mode: 'valid' # ARCHITECTURE: # name: "ResNet50" # pretrained_model: "/home/aistudio/PaddleClas-release-2.1/output/ResNet50/best_model/ppcls" # classes_num: 4 # total_images: 1311 # topk: 4 # image_shape: [3, 512, 512] # VALID: # batch_size: 16 # num_workers: 0 # file_list: "/home/aistudio/PaddleClas-release-2.1/dataset/test_list.txt" # data_dir: "/home/aistudio/PaddleClas-release-2.1/dataset/" # shuffle_seed: 0 # transforms: # - DecodeImage: # to_rgb: True # channel_first: False # - ResizeImage: # resize_short: 256 # - CropImage: # size: 224 # - NormalizeImage: # scale: 1.0/255.0 # mean: [0.485, 0.456, 0.406] # std: [0.229, 0.224, 0.225] # order: '' # - ToCHWImage:
#start assessment !python tools/eval.py \ -c /home/aistudio/PaddleClas-release-2.1/configs/eval.yaml
5. Forecast
#predict ct !python tools/infer/infer.py \ -i /home/aistudio/PaddleClas-release-2.1/dataset/Testing/glioma \ --model ResNet50 \ --pretrained_model "output/ResNet50/best_model/ppcls" \ --load_static_weights False \ --class_num=4
Related Information
Mentor: Lin Xu
Student: Liu Yang
Please click here Check out the basic usage of this environment.
Please click here for more detailed instructions.
This article is moved
[Original project link].(https://aistudio.baidu.com/aistudio/projectdetail/5256517?forkThirdPart=1)