AI recognition of rk3399_ three

This article continues to describe how to combine the front webcam for AI recognition on 3399. At present, we use the latest yolov5 model for recognition. About the RkNN used in this model_ API, which can be downloaded from the official website of 3399 related forums. The 1808 driver needs to be updated to more than 1.6.

The examples given on the official website are to analyze a picture and draw the recognition results on the picture again. python may have more and more interfaces, which support direct pull flow analysis. But how to pull the stream through the c + + interface for analysis? Further, can we analyze the multi-channel code stream. It is feasible for us to recognize multi-channel video without requiring the recognition of frame rate.

First, we determine the process of AI recognition

Load model data - initialize the calculation bar - transfer picture RGB data through api interface - analyze the results - frame identification

Through the process, we can see that the initialization calculation bar of the loading model is fixed, and the analysis results of the calculation bar do not need to be concerned by the user. The user-defined process is to import RGB data and draw a frame after the analysis results come out (this is an optional step, not necessarily a frame). Combined with our second article, the idea is clear. We have completed the decoding operation of the camera and obtained the yuv data. Our operation is to convert it into RGB data and then use RkNN_ Just plug the API interface into the computing stick. Our rga interface comes in handy. It can convert yuv data into RGB data (the data I test here is actually converted into bgr data).

It should be noted that the incoming rgb data is not the original length and width. Considering the speed of analysis, the pictures are generally compressed. For example, for a model that supports 416 * 416 data in training, the original camera is 1080 * 720 (720P). After obtaining the YUV data of 720P, it needs to be scaled into 416 * 416 rgb data before it can be correctly analyzed. So this step must determine the size of your training model, and then scale and stretch it accordingly. rgb can generally be directly scaled to rgb data of this size, but in order to better identify the real scene and reduce the distortion of objects caused by scaling. We should scale the picture according to the integer scale, and then fill in the redundancy. This can use the copymakeorder function of opencv. The reference lettbox function is as follows:

cv::Mat CV5Detect::letterbox(ffmpeg::RgaData &rData, int target_w, int target_h, int &wpad, int &hpad, float &scale)
    int in_w = rData.width;
    int in_h = rData.height;
    int tar_w = target_w;
    int tar_h = target_h;
    float r = std::min(float(tar_h) / in_h, float(tar_w) / in_w);
    int inside_w = round(in_w*r);
    int inside_h = round(in_h*r);
    int padd_w = tar_w - inside_w;
    int padd_h = tar_h - inside_h;
    wpad = padd_w;
    hpad = padd_h;
    scale = r;
    cv::Mat src(in_h, in_w, CV_8UC3); =;
    cv::Mat resize_img(inside_h, inside_w, CV_8UC3);
    ///If the width and height are not 4 aligned, use soft shrinkage (low performance)
    if (inside_w % 4 != 0 || inside_h % 4 != 0)
        cv::resize(src, resize_img, cv::Size(inside_w, inside_h));
        ffmpeg::RgaData rgaDataOut;
        rgaDataOut.type = ffmpeg::BGR24;
        rgaDataOut.width = inside_w;
        rgaDataOut.height = inside_h; = m_picData.get();
        ///Scaling through rga has high performance and does not occupy cpu
        if (!m_pPeg->transRga(rData, rgaDataOut))
            errorf("enc transRga failed\n");
            return resize_img;
        } =;
    // cv::imwrite("resize.jpg", resize_img);
    padd_w = padd_w / 2;
    padd_h = padd_h / 2;
    int top = int(round(padd_h - 0.1));

    int bottom = hpad - top;
    int left = int(round(padd_w - 0.1));

    int right = wpad - left;
    ///Redundant filling makes the picture conform to the RGB image size of the incoming computing stick under the condition of equal scaling
    cv::copyMakeBorder(resize_img, resize_img, top, bottom, left, right, cv::BORDER_CONSTANT, cv::Scalar(114, 114, 114));
    return resize_img;

For AI recognition of all cameras, it can basically achieve full frame rate recognition. Both display and decode can be done in serial mode. But for simultaneous multi-channel identification, it can not be serial. Need to decode, AI and display asynchronously. Reduce the frame rate at the same time.

This article ends as a 3399 video series. The attachment package provides its own encapsulated decoding, AI and display a set of processes. AI recognition is based on the scheme of 3399 plus 1808. If it is 3399Pro, it needs a little change. The program only guarantees the display and recognition effect of 720P and below resolution. The code stream type is H264, which can be configured for the camera.

Procedure usage:

After downloading the program, unzip the compressed package to any directory of 3399 and run run SH is enough. Configure the video method and edit config/QtUi/video_channel.json file. According to the field explanation in the figure below, configure the corresponding fields. After configuration, restart the program to take effect

        "QtUiConfig" : 
                "channels" : --->Array, access multiple cameras. Up to 8 channels 720 P
                                "camType" : 0,
                                "enable" : true, --->Enable flow, setting true,Perform streaming decoding operation
                                "enableAI" : true, --->Enable, which can be carried out when the calculation rod is inserted AI distinguish
                                "ip" : "", --->camera IP,It can be filled in according to the actual situation
                                "name" : "chan2", --->Camera name, user-defined. In case of multiple channels, the name cannot be repeated
                                "nodeId" : "E024E47A524C45F8", --->This field will be generated automatically. It is recommended to delete it. If you fill it in manually, make sure it is different
                                "password" : "root12345", --->Camera password
                                "port" : 80, 
                                "procted_zone" : 
                                        "points" : [],
                                        "type" : ""
                                "profile" : "Profile_1",
                                "protocol" : "onvif",
                                "streamUrl" : "rtsp://admin: root12345@ : 554 / streaming / channels / 101 ", --- > camera rtsp streaming address. This is an example of Haikang. Other manufacturers' streaming addresses should be filled in by themselves
                                "user" : "admin", --->user name
                                "widgetId" : --->Playback window serial number
                "display" : true --->true For display QT Interface, false Do not show

It is recommended to connect the display screen to display the flow interface and AI identification results on the QT interface. You can double-click the lower left corner of the interface, channel list - all channels, and double-click the specific channel. The right window can be played

The secondary development interface and program run for free. Please contact wechat HardAndBetter for the license, or join the QQ group 586166104 for discussion.

Program download path

        3399 video AI analysis demo - image recognition document resources - CSDN Download

No points can be downloaded from Baidu online disk. The path is as follows: Extraction code: 8jjd

Tags: Embedded system AI

Posted by alarik149 on Mon, 18 Apr 2022 19:38:56 +0930