YOLOv3 usage notes

catalogue

Use free DL environment

Google Cloud Platform (GCP) deep learning virtual machine (VM) (recommended!)

Google Colab Notebook

Amazon Web Services

Dataset annotation tool

Roboflow (recommended!)

CVAT

labelimg

Training visualization

wandb (recommended!)

tensorboardX

Installation and use of YOLOv3

Custom workout YOLOv3

Combination of OpenCV and YOLOv3

Other YOLOv3_C + + usage

Other people's open source code

Use free DL environment

Google Cloud Platform (GCP) deep learning virtual machine (VM) (recommended!)

You can get a $300 limit for free

course: GCP Quickstart · ultralytics/yolov5 Wiki · GitHub

Google Colab Notebook

Free use, GPU provided

course: https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb

Amazon Web Services

Free trial quota

Link: Amazon AWS overseas regional account free package_ Free cloud service AWS cloud service

Dataset annotation tool

Roboflow (recommended!)

Official website: Overview - Roboflow

course: How to Train YOLOv5 On a Custom Dataset

CVAT

Official website: Computer Vision Annotation Tool

course: How to use CVAT for computer vision [2022 updates]

labelimg

Official website: GitHub - tzutalin/labelImg: 🖍️ LabelImg is a graphical image annotation tool and label object bounding boxes in images

course: LabelImg for computer vision annotation

Training visualization

wandb (recommended!)

Official website: Weights & Biases

tensorboardX

Official website: GitHub - lanpa/tensorboardX: tensorboard for pytorch (and chainer, mxnet, numpy, ...)

Installation and use of YOLOv3

Official website: YOLO: Real-Time Object Detection

Official Github: GitHub - ultralytics/yolov3: YOLOv3 in PyTorch > ONNX > CoreML > TFLite

Official documents: YOLOv5 Documentation

Official papers: https://arxiv.org/abs/1804.02767v1

Custom workout YOLOv3

course: Training YOLOv3 : Deep Learning based Custom Object Detector | LearnOpenCV #

Use the Darknet framework training model, which is written in C language.

1. Download and compile

cd ~
git clone https://github.com/pjreddie/darknet
cd darknet
# Using nproc, you can see the number of available cores
make -j4

2. Prepare dataset

Download the data set and split it into training set (70% ~ 90%) and test set (10% ~ 30%).

Dataset split script splittrainandtest Py example:

import random
import os
import subprocess
import sys

def split_data_set(image_dir):
    f_val = open("test.txt", 'w')
    f_train = open("train.txt", 'w')
    
    path, dirs, files = next(os.walk(image_dir))
    data_size = len(files)

    ind = 0
    data_test_size = int(0.1 * data_size)
    test_array = random.sample(range(data_size), k=data_test_size)
    
    for f in os.listdir(image_dir):
        if(f.split(".")[1] == "jpg"):
            ind += 1
            
            if ind in test_array:
                f_val.write(image_dir+'/'+f+'\n')
            else:
                f_train.write(image_dir+'/'+f+'\n')

split_data_set(sys.argv[1])

Usage:

python3 splitTrainAndTest.py ./path/JPEGImages/

3. Label dataset

Use software to label the data set. Where each line entry in the label file represents a single bounding box in the image and contains the following information about the box:

<object-class-id> <center-x> <center-y> <width> <height>
  • Object class ID is an integer that represents the class of the object. It ranges from 0 to (number of classes - 1). In our current example, because we have only one category, it is always set to 0.

  • center-x and center-y are the X and Y coordinates (in pixels) of the center of the bounding box, which are normalized by the image width and height respectively.

  • Width and height are the width and height (in pixels) of the bounding box respectively, which are normalized by the image width and height respectively.

Labeling software:

Automatically generated category information:

Automatically generated label information:

4. Download pre training model

wget https://pjreddie.com/media/files/darknet53.conv.74

5. Prepare data file

At XXX In the data file, set the path information of each file

classes = 1
train  = /home/sxf/Desktop/yolov3/darknet/datasets/train.txt
valid  = /home/sxf/Desktop/yolov3/darknet/datasets/test.txt
names = /home/sxf/Desktop/yolov3/darknet/datasets/classes.names
backup = /home/sxf/Desktop/yolov3/darknet/datasets/weights/

6. YOLO parameter configuration

Use the model configuration file darknet-yolov3.0 cfg.

7. Start training

./darknet detector train /home/sxf/Desktop/yolov3/darknet/datasets/darknet.data /home/sxf/Desktop/yolov3/darknet/datasets/darknet-yolov3.cfg ./darknet53.conv.74 > ./train.log

8. Test model

python3 object_detection_yolo.py --image=image.jpg

Combination of OpenCV and YOLOv3

Opencv4.0 already contains DNN related library functions, which can be very convenient to call the trained YOLO3 model. The OpenCV CPU version is 9 times faster.

Related combined tutorials:

Installation of OpenCV:

Install OpenCV on each platform_ Xiaofeng senior life big bang blog - CSDN blog

OpenCV tutorial collection:

GitHub - spmallick/learnopencv: Learn OpenCV : C++ and Python Examples

Some possible problems of YOLO:

YOLO related problem record_ Xiaofeng senior life big bang blog - CSDN blog

Upgrade cmake3.0 for Ubuntu 22:

Ubuntu20.04 upgrade cmake3.0 22 (for raspberry pie)_ Xiaofeng senior life big bang blog - CSDN blog_ Raspberry pie update cmake

Weight and model configuration file download:

#!/bin/bash
wget "https://raw.githubusercontent.com/spmallick/learnopencv/master/ObjectDetection-YOLO/yolov3.cfg"
wget "https://pjreddie.com/media/files/yolov3.weights"
wget "https://raw.githubusercontent.com/spmallick/learnopencv/master/ObjectDetection-YOLO/coco.names"
wget "https://raw.githubusercontent.com/spmallick/learnopencv/master/ObjectDetection-YOLO/run.mp4"

Grant permission and execute

sudo chmod +x download.sh 
./download.sh

Writing C + + code

#include <iostream>
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/dnn.hpp>
#include <opencv2/dnn/all_layers.hpp>
#include <fstream>

using namespace cv;
using namespace std;
using namespace dnn;

vector<string> classes;//Container for storing names
float confThreshold = 0.5;//Confidence threshold
float nmsThreshold = 0.4;//Non maximum suppression threshold
int inpWidth = 416;//Network input picture width
int inpHeight = 416;//Network input picture height
//Remove low confidence bounding box
void postprocess(cv::Mat& frame,const vector<cv::Mat>& out);
//Draw the prediction bounding box
void drawPred(int classId,float conf,int left,int top,int right,int bottom,cv::Mat& frame);
//Gets the name of the output layer
vector<cv::String> getOutputNames(const cv::dnn::Net& net);

int main() {
    string device = "cpu";
    string bastPath = "/home/sxf/Desktop/yolov3/project/";

    //Save the class name into the container
    string classesFile = bastPath+"model/coco.names";//coco.names contains 80 different class names
    ifstream ifs(classesFile.c_str());
    string line;
    while(getline(ifs,line)) classes.push_back(line);

    //Get the configuration and weight file of the model
    cv::String modelConfiguration = bastPath+"model/yolov3.cfg";
    cv::String modelWeights = bastPath+"model/yolov3.weights";

    //Load network
    cv::dnn::Net net = cv::dnn::readNetFromDarknet(modelConfiguration, modelWeights);
    if (device == "cpu")
    {
        cout << "Using CPU device" << endl;
        net.setPreferableBackend(cv::dnn::DNN_BACKEND_OPENCV);
        net.setPreferableBackend(cv::dnn::DNN_TARGET_CPU);
    }
    else if (device == "gpu")
    {
        cout << "Using GPU device" << endl;
        net.setPreferableBackend(cv::dnn::DNN_BACKEND_CUDA);
        net.setPreferableTarget(cv::dnn::DNN_TARGET_CUDA);
    }


    //Open a video file or graphics file or camera data stream
    cv::VideoCapture cap(bastPath+"model/run.mp4");
    //Turn on the camera
    //cv::VideoCapture cap(1);
    cv::VideoWriter video;
    string str, outputFile;
    cv::Mat frame, blob;

    //create a window
    static const string kWinName = "Deep learning object detection in OpenCV";
    cv::namedWindow(kWinName,cv::WINDOW_AUTOSIZE);

    //Process each frame
    while(cv::waitKey(1)<0){
        //Take each frame of image
        cap>>frame;
        //If the video is finished, stop the program
        if(frame.empty()){
            break;
        }
        //Loading pictures from disk in dnn
        cv::dnn::blobFromImage(frame,blob,1/255.0,cv::Size(inpWidth,inpHeight));
        //Set input network
        net.setInput(blob);
        //Set output layer
        vector<cv::Mat> outs;//Store identification results
        net.forward(outs,getOutputNames(net));
        //Remove low confidence bounding box
        postprocess(frame,outs);
        //Display s delay information and draw
        vector<double> layersTimes;
        double freq = cv::getTickFrequency()/1000;
        double t = net.getPerfProfile(layersTimes)/freq;
        string label = cv::format("Infercence time for a frame:%.2f ms",t);
        cv::putText(frame,label,cv::Point(0,15),cv::FONT_HERSHEY_SIMPLEX,0.5,cv::Scalar(0,255,255));
        //Draw identification box
        cv::Mat detecteFrame;
        frame.convertTo(detecteFrame,CV_8U);
        cv::imshow(kWinName,frame);
    }
    cap.release();

    return 0;
}

//Remove low confidence bounding box
void postprocess(cv::Mat& frame,const vector<cv::Mat>& outs){
    vector<int> classIds;//Store index of identification class
    vector<float> confidences;//Storage confidence
    vector<cv::Rect> boxes;//Save border

    for(size_t i=0;i<outs.size();i++){
        //Scan all bounding boxes from network output
        //Retain high confidence checkbox
        //Target data:x,y,w,h are percentages, x,y are coordinates of target center point
        float* data = (float*)outs[i].data;
        for(int j=0;j<outs[i].rows;j++,data+=outs[i].cols){
            cv::Mat scores = outs[i].row(j).colRange(5,outs[i].cols);
            cv::Point classIdPoint;
            double confidence;//Confidence
            //Get the maximum score and index
            cv::minMaxLoc(scores,0,&confidence,0,&classIdPoint);
            if(confidence>confThreshold){
                int centerX = (int)(data[0]*frame.cols);
                int centerY = (int)(data[1]*frame.rows);
                int width = (int)(data[2]*frame.cols);
                int height = (int)(data[3]*frame.rows);
                int left = centerX-width/2;
                int top = centerY-height/2;

                classIds.push_back(classIdPoint.x);
                confidences.push_back((float)confidence);
                boxes.push_back(cv::Rect(left, top, width, height));
            }

        }

    }

    //Low confidence
    vector<int> indices;//Save indexes without overlapping borders
    //This function suppresses overlapping borders
    cv::dnn::NMSBoxes(boxes,confidences,confThreshold,nmsThreshold,indices);
    for(size_t i=0;i<indices.size();i++){
        int idx = indices[i];
        cv::Rect box = boxes[idx];
        drawPred(classIds[idx],confidences[idx],box.x,box.y,
                 box.x+box.width,box.y+box.height,frame);
    }
}

//Draw prediction bounding box
void drawPred(int classId,float conf,int left,int top,int right,int bottom,cv::Mat& frame){
    //Draw bounding box
    cv::rectangle(frame,cv::Point(left,top),cv::Point(right,bottom),cv::Scalar(255,178,50),3);

    string label = cv::format("%.2f",conf);
    if(!classes.empty()){
        CV_Assert(classId < (int)classes.size());
        label = classes[classId]+":"+label;//Category label and confidence on the border
    }
    //Draw the label on the bounding box
    int baseLine;
    cv::Size labelSize = cv::getTextSize(label,cv::FONT_HERSHEY_SIMPLEX,0.5,1,&baseLine);
    top = max(top,labelSize.height);
    cv::rectangle(frame,cv::Point(left,top-round(1.5*labelSize.height)),cv::Point(left+round(1.5*labelSize.width),top+baseLine),cv::Scalar(255,255,255),cv::FILLED);
    cv::putText(frame, label,cv::Point(left, top), cv::FONT_HERSHEY_SIMPLEX, 0.75,cv::Scalar(0, 0, 0), 1);
}

//Get the name from the output layer
vector<cv::String> getOutputNames(const cv::dnn::Net& net){
    static vector<cv::String> names;
    if(names.empty()){
        //Obtain output layer indicators
        vector<int> outLayers = net.getUnconnectedOutLayers();
        vector<cv::String> layersNames = net.getLayerNames();
        //Get output layer name
        names.resize(outLayers.size());
        for(size_t i =0;i<outLayers.size();i++){
            names[i] = layersNames[outLayers[i]-1];
        }
    }
    return names;
}

CMakeLists can be referenced as:

cmake_minimum_required(VERSION 3.22)
project(project)

set(CMAKE_CXX_STANDARD 17)

find_package(OpenCV REQUIRED)
find_package(Doxygen)

if (NOT APPLE)
    find_package(OpenMP)
endif ()

# ============================================================================ #
# Compilation flags
IF(UNIX)
    SET(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS} -g -O0  -Wall -Wextra -Wunused-variable -DDEBUG -D_DEBUG")
    SET(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS} -O0 -g  -Wall -Wextra -Wunused-variable -DDEBUG -D_DEBUG")
ENDIF(UNIX)

if(OPENMP_FOUND)
    MESSAGE("OpenMP found")
    if(UNIX)
        SET(CMAKE_C_FLAGS_RELEASE "-O3  -Wall -Wextra -Wunused-variable  -g -fPIC -msse2 -msse3 -msse4 -ffast-math")
        SET(CMAKE_CXX_FLAGS_RELEASE "-O3 -Wall -Wextra -Wunused-variable -g -fPIC -msse2 -msse3 -msse4 -ffast-math")
    endif(UNIX)
    SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
    SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
else(OPENMP_FOUND)
    MESSAGE("OpenMP not found")
    if(UNIX)
        SET(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS} -O3 -Wall -std=c++0x -Wunused-variable -Wno-unknown-pragmas -g -fPIC -msse2 -msse3 -msse4 -ffast-math")
        SET(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS} -O3 -Wall -std=c++0x -Wno-unknown-pragmas -Wunused-variable -g -fPIC -msse2 -msse3 -msse4 -ffast-math")
    endif(UNIX)
endif(OPENMP_FOUND)

# ============================================================================ #
include_directories( ${OpenCV_INCLUDE_DIRS})


add_executable(project main.cpp)
target_link_libraries(project ${OpenCV_LIBS})

# ============================================================================ #
# Generate Doxygen-based documentation project
if(DOXYGEN_FOUND)
    add_custom_target(akaze_documentation
            ${DOXYGEN_EXECUTABLE} ${CMAKE_CURRENT_SOURCE_DIR}/Doxyfile
            WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
            COMMENT "Generating API documentation with Doxygen" VERBATIM)
endif(DOXYGEN_FOUND)

Other YOLOv3_C + + usage

Other people's open source code

Github link: GitHub - zqfang/YOLOv3_CPP: YOLOv3 C++

To be continued

Tags: Python Deep Learning Machine Learning yolo

Posted by shwanky on Sun, 17 Apr 2022 18:30:32 +0930