catalogue
Google Cloud Platform (GCP) deep learning virtual machine (VM) (recommended!)
Installation and use of YOLOv3
Combination of OpenCV and YOLOv3
Other people's open source code
Use free DL environment
Google Cloud Platform (GCP) deep learning virtual machine (VM) (recommended!)
You can get a $300 limit for free
course: GCP Quickstart · ultralytics/yolov5 Wiki · GitHub
Google Colab Notebook
Free use, GPU provided
course: https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb
Amazon Web Services
Free trial quota
Link: Amazon AWS overseas regional account free package_ Free cloud service AWS cloud service
Dataset annotation tool
Roboflow (recommended!)
Official website: Overview - Roboflow
course: How to Train YOLOv5 On a Custom Dataset
CVAT
Official website: Computer Vision Annotation Tool
course: How to use CVAT for computer vision [2022 updates]
labelimg
Official website: GitHub - tzutalin/labelImg: 🖍️ LabelImg is a graphical image annotation tool and label object bounding boxes in images
course: LabelImg for computer vision annotation
Training visualization
wandb (recommended!)
Official website: Weights & Biases
tensorboardX
Official website: GitHub - lanpa/tensorboardX: tensorboard for pytorch (and chainer, mxnet, numpy, ...)
Installation and use of YOLOv3
Official website: YOLO: Real-Time Object Detection
Official Github: GitHub - ultralytics/yolov3: YOLOv3 in PyTorch > ONNX > CoreML > TFLite
Official documents: YOLOv5 Documentation
Official papers: https://arxiv.org/abs/1804.02767v1
Custom workout YOLOv3
course: Training YOLOv3 : Deep Learning based Custom Object Detector | LearnOpenCV #
Use the Darknet framework training model, which is written in C language.
1. Download and compile
cd ~ git clone https://github.com/pjreddie/darknet cd darknet # Using nproc, you can see the number of available cores make -j4
2. Prepare dataset
Download the data set and split it into training set (70% ~ 90%) and test set (10% ~ 30%).
Dataset split script splittrainandtest Py example:
import random import os import subprocess import sys def split_data_set(image_dir): f_val = open("test.txt", 'w') f_train = open("train.txt", 'w') path, dirs, files = next(os.walk(image_dir)) data_size = len(files) ind = 0 data_test_size = int(0.1 * data_size) test_array = random.sample(range(data_size), k=data_test_size) for f in os.listdir(image_dir): if(f.split(".")[1] == "jpg"): ind += 1 if ind in test_array: f_val.write(image_dir+'/'+f+'\n') else: f_train.write(image_dir+'/'+f+'\n') split_data_set(sys.argv[1])
Usage:
python3 splitTrainAndTest.py ./path/JPEGImages/
3. Label dataset
Use software to label the data set. Where each line entry in the label file represents a single bounding box in the image and contains the following information about the box:
<object-class-id> <center-x> <center-y> <width> <height>
-
Object class ID is an integer that represents the class of the object. It ranges from 0 to (number of classes - 1). In our current example, because we have only one category, it is always set to 0.
-
center-x and center-y are the X and Y coordinates (in pixels) of the center of the bounding box, which are normalized by the image width and height respectively.
-
Width and height are the width and height (in pixels) of the bounding box respectively, which are normalized by the image width and height respectively.
Labeling software:
Automatically generated category information:
Automatically generated label information:
4. Download pre training model
wget https://pjreddie.com/media/files/darknet53.conv.74
5. Prepare data file
At XXX In the data file, set the path information of each file
classes = 1 train = /home/sxf/Desktop/yolov3/darknet/datasets/train.txt valid = /home/sxf/Desktop/yolov3/darknet/datasets/test.txt names = /home/sxf/Desktop/yolov3/darknet/datasets/classes.names backup = /home/sxf/Desktop/yolov3/darknet/datasets/weights/
6. YOLO parameter configuration
Use the model configuration file darknet-yolov3.0 cfg.
7. Start training
./darknet detector train /home/sxf/Desktop/yolov3/darknet/datasets/darknet.data /home/sxf/Desktop/yolov3/darknet/datasets/darknet-yolov3.cfg ./darknet53.conv.74 > ./train.log
8. Test model
python3 object_detection_yolo.py --image=image.jpg
Combination of OpenCV and YOLOv3
Opencv4.0 already contains DNN related library functions, which can be very convenient to call the trained YOLO3 model. The OpenCV CPU version is 9 times faster.
Related combined tutorials:
Installation of OpenCV:
Install OpenCV on each platform_ Xiaofeng senior life big bang blog - CSDN blog
OpenCV tutorial collection:
GitHub - spmallick/learnopencv: Learn OpenCV : C++ and Python Examples
Some possible problems of YOLO:
YOLO related problem record_ Xiaofeng senior life big bang blog - CSDN blog
Upgrade cmake3.0 for Ubuntu 22:
Weight and model configuration file download:
#!/bin/bash wget "https://raw.githubusercontent.com/spmallick/learnopencv/master/ObjectDetection-YOLO/yolov3.cfg" wget "https://pjreddie.com/media/files/yolov3.weights" wget "https://raw.githubusercontent.com/spmallick/learnopencv/master/ObjectDetection-YOLO/coco.names" wget "https://raw.githubusercontent.com/spmallick/learnopencv/master/ObjectDetection-YOLO/run.mp4"
Grant permission and execute
sudo chmod +x download.sh ./download.sh
Writing C + + code
#include <iostream> #include <opencv2/imgproc.hpp> #include <opencv2/highgui.hpp> #include <opencv2/dnn.hpp> #include <opencv2/dnn/all_layers.hpp> #include <fstream> using namespace cv; using namespace std; using namespace dnn; vector<string> classes;//Container for storing names float confThreshold = 0.5;//Confidence threshold float nmsThreshold = 0.4;//Non maximum suppression threshold int inpWidth = 416;//Network input picture width int inpHeight = 416;//Network input picture height //Remove low confidence bounding box void postprocess(cv::Mat& frame,const vector<cv::Mat>& out); //Draw the prediction bounding box void drawPred(int classId,float conf,int left,int top,int right,int bottom,cv::Mat& frame); //Gets the name of the output layer vector<cv::String> getOutputNames(const cv::dnn::Net& net); int main() { string device = "cpu"; string bastPath = "/home/sxf/Desktop/yolov3/project/"; //Save the class name into the container string classesFile = bastPath+"model/coco.names";//coco.names contains 80 different class names ifstream ifs(classesFile.c_str()); string line; while(getline(ifs,line)) classes.push_back(line); //Get the configuration and weight file of the model cv::String modelConfiguration = bastPath+"model/yolov3.cfg"; cv::String modelWeights = bastPath+"model/yolov3.weights"; //Load network cv::dnn::Net net = cv::dnn::readNetFromDarknet(modelConfiguration, modelWeights); if (device == "cpu") { cout << "Using CPU device" << endl; net.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV); net.setPreferableBackend(cv::dnn::DNN_TARGET_CPU); } else if (device == "gpu") { cout << "Using GPU device" << endl; net.setPreferableBackend(cv::dnn::DNN_BACKEND_CUDA); net.setPreferableTarget(cv::dnn::DNN_TARGET_CUDA); } //Open a video file or graphics file or camera data stream cv::VideoCapture cap(bastPath+"model/run.mp4"); //Turn on the camera //cv::VideoCapture cap(1); cv::VideoWriter video; string str, outputFile; cv::Mat frame, blob; //create a window static const string kWinName = "Deep learning object detection in OpenCV"; cv::namedWindow(kWinName,cv::WINDOW_AUTOSIZE); //Process each frame while(cv::waitKey(1)<0){ //Take each frame of image cap>>frame; //If the video is finished, stop the program if(frame.empty()){ break; } //Loading pictures from disk in dnn cv::dnn::blobFromImage(frame,blob,1/255.0,cv::Size(inpWidth,inpHeight)); //Set input network net.setInput(blob); //Set output layer vector<cv::Mat> outs;//Store identification results net.forward(outs,getOutputNames(net)); //Remove low confidence bounding box postprocess(frame,outs); //Display s delay information and draw vector<double> layersTimes; double freq = cv::getTickFrequency()/1000; double t = net.getPerfProfile(layersTimes)/freq; string label = cv::format("Infercence time for a frame:%.2f ms",t); cv::putText(frame,label,cv::Point(0,15),cv::FONT_HERSHEY_SIMPLEX,0.5,cv::Scalar(0,255,255)); //Draw identification box cv::Mat detecteFrame; frame.convertTo(detecteFrame,CV_8U); cv::imshow(kWinName,frame); } cap.release(); return 0; } //Remove low confidence bounding box void postprocess(cv::Mat& frame,const vector<cv::Mat>& outs){ vector<int> classIds;//Store index of identification class vector<float> confidences;//Storage confidence vector<cv::Rect> boxes;//Save border for(size_t i=0;i<outs.size();i++){ //Scan all bounding boxes from network output //Retain high confidence checkbox //Target data:x,y,w,h are percentages, x,y are coordinates of target center point float* data = (float*)outs[i].data; for(int j=0;j<outs[i].rows;j++,data+=outs[i].cols){ cv::Mat scores = outs[i].row(j).colRange(5,outs[i].cols); cv::Point classIdPoint; double confidence;//Confidence //Get the maximum score and index cv::minMaxLoc(scores,0,&confidence,0,&classIdPoint); if(confidence>confThreshold){ int centerX = (int)(data[0]*frame.cols); int centerY = (int)(data[1]*frame.rows); int width = (int)(data[2]*frame.cols); int height = (int)(data[3]*frame.rows); int left = centerX-width/2; int top = centerY-height/2; classIds.push_back(classIdPoint.x); confidences.push_back((float)confidence); boxes.push_back(cv::Rect(left, top, width, height)); } } } //Low confidence vector<int> indices;//Save indexes without overlapping borders //This function suppresses overlapping borders cv::dnn::NMSBoxes(boxes,confidences,confThreshold,nmsThreshold,indices); for(size_t i=0;i<indices.size();i++){ int idx = indices[i]; cv::Rect box = boxes[idx]; drawPred(classIds[idx],confidences[idx],box.x,box.y, box.x+box.width,box.y+box.height,frame); } } //Draw prediction bounding box void drawPred(int classId,float conf,int left,int top,int right,int bottom,cv::Mat& frame){ //Draw bounding box cv::rectangle(frame,cv::Point(left,top),cv::Point(right,bottom),cv::Scalar(255,178,50),3); string label = cv::format("%.2f",conf); if(!classes.empty()){ CV_Assert(classId < (int)classes.size()); label = classes[classId]+":"+label;//Category label and confidence on the border } //Draw the label on the bounding box int baseLine; cv::Size labelSize = cv::getTextSize(label,cv::FONT_HERSHEY_SIMPLEX,0.5,1,&baseLine); top = max(top,labelSize.height); cv::rectangle(frame,cv::Point(left,top-round(1.5*labelSize.height)),cv::Point(left+round(1.5*labelSize.width),top+baseLine),cv::Scalar(255,255,255),cv::FILLED); cv::putText(frame, label,cv::Point(left, top), cv::FONT_HERSHEY_SIMPLEX, 0.75,cv::Scalar(0, 0, 0), 1); } //Get the name from the output layer vector<cv::String> getOutputNames(const cv::dnn::Net& net){ static vector<cv::String> names; if(names.empty()){ //Obtain output layer indicators vector<int> outLayers = net.getUnconnectedOutLayers(); vector<cv::String> layersNames = net.getLayerNames(); //Get output layer name names.resize(outLayers.size()); for(size_t i =0;i<outLayers.size();i++){ names[i] = layersNames[outLayers[i]-1]; } } return names; }
CMakeLists can be referenced as:
cmake_minimum_required(VERSION 3.22) project(project) set(CMAKE_CXX_STANDARD 17) find_package(OpenCV REQUIRED) find_package(Doxygen) if (NOT APPLE) find_package(OpenMP) endif () # ============================================================================ # # Compilation flags IF(UNIX) SET(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS} -g -O0 -Wall -Wextra -Wunused-variable -DDEBUG -D_DEBUG") SET(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS} -O0 -g -Wall -Wextra -Wunused-variable -DDEBUG -D_DEBUG") ENDIF(UNIX) if(OPENMP_FOUND) MESSAGE("OpenMP found") if(UNIX) SET(CMAKE_C_FLAGS_RELEASE "-O3 -Wall -Wextra -Wunused-variable -g -fPIC -msse2 -msse3 -msse4 -ffast-math") SET(CMAKE_CXX_FLAGS_RELEASE "-O3 -Wall -Wextra -Wunused-variable -g -fPIC -msse2 -msse3 -msse4 -ffast-math") endif(UNIX) SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}") SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}") else(OPENMP_FOUND) MESSAGE("OpenMP not found") if(UNIX) SET(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS} -O3 -Wall -std=c++0x -Wunused-variable -Wno-unknown-pragmas -g -fPIC -msse2 -msse3 -msse4 -ffast-math") SET(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS} -O3 -Wall -std=c++0x -Wno-unknown-pragmas -Wunused-variable -g -fPIC -msse2 -msse3 -msse4 -ffast-math") endif(UNIX) endif(OPENMP_FOUND) # ============================================================================ # include_directories( ${OpenCV_INCLUDE_DIRS}) add_executable(project main.cpp) target_link_libraries(project ${OpenCV_LIBS}) # ============================================================================ # # Generate Doxygen-based documentation project if(DOXYGEN_FOUND) add_custom_target(akaze_documentation ${DOXYGEN_EXECUTABLE} ${CMAKE_CURRENT_SOURCE_DIR}/Doxyfile WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR} COMMENT "Generating API documentation with Doxygen" VERBATIM) endif(DOXYGEN_FOUND)
Other YOLOv3_C + + usage
Other people's open source code
Github link: GitHub - zqfang/YOLOv3_CPP: YOLOv3 C++
To be continued