[code reading] [3d target detection] PV RCNN code reading data preparation

Recently, I'm looking at the source code of pvrcnn to deepen my understanding of the paper. The understanding of the source code is thanks to the boss notes , thank you!
First we see train Py function, preprocessing the data before training:

    train_set, train_loader, train_sampler = build_dataloader(
        dataset_cfg=cfg.DATA_CONFIG,
        class_names=cfg.CLASS_NAMES,
        batch_size=args.batch_size,
        dist=dist_train, workers=args.workers,
        logger=logger,
        training=True,
        merge_all_iters_to_one_epoch=args.merge_all_iters_to_one_epoch,
        total_epochs=args.epochs
    )

cfg and arg are the configuration files and variables defined above.
Let's go in and check the build_ What does the dataloader function do.
build_dataloader is defined in_ init_.py, first we select dataset in the configuration file to initialize the dataset:

    dataset = __all__[dataset_cfg.DATASET](
        dataset_cfg=dataset_cfg,
        class_names=class_names,
        root_path=root_path,
        training=training,
        logger=logger,
    )

__all__ = {
    'DatasetTemplate': DatasetTemplate,
    'KittiDataset': KittiDataset,
    'NuScenesDataset': NuScenesDataset,
    'WaymoDataset': WaymoDataset
}

The all function mainly selects the data set we want to preprocess through the configuration file: kittiDataset
KittiDataset is defined in kitti_dataset.py, we first initialize it:

    def __init__(self, dataset_cfg, class_names, training=True, root_path=None, logger=None):
        """
        Args:
            root_path:
            dataset_cfg:
            class_names:
            training:
            logger:
        """
        # Initialize the class and assign parameters to the properties of the class
        super().__init__(
            dataset_cfg=dataset_cfg, class_names=class_names, training=training, root_path=root_path, logger=logger
        )
        # Is the transfer parameter a training set train or a verification set val
        self.split = self.dataset_cfg.DATA_SPLIT[self.mode]
        # root_ The path of path is/ data/kitti/
        # The kitti dataset consists of three folders: "training", "testing" and "ImageSets"
        # If it is a training set train, refer to the path of the file as training set, otherwise it is testing set
        self.root_split_path = self.root_path / ('training' if self.split != 'test' else 'testing')
        # /data/kitti/ImageSets / there are three files in total: test txt , train. txt ,val.txt
        # Select one of the files
        split_dir = self.root_path / 'ImageSets' / (self.split + '.txt')
        # Get txt file to form the list sample_id_list
        self.sample_id_list = [x.strip() for x in open(split_dir).readlines()] if split_dir.exists() else None
        # Create an empty list for storing kitti information
        self.kitti_infos = []
        # Call the function to load kitti data. The value of mode is train or test
        self.include_kitti_data(self.mode)

It is mainly based on the txt file in the data set to get the sample_id_list and create a Kitti_ The empty dictionary of infos is used to store information later.
We ignore some auxiliary functions and see directly__ getitem__ Function, which mainly reads the data set information in the generated pkl file, then determines the frame number according to info ['point_cloud'] ['lidar_idx'], reads the data and other info fields, and preliminarily reads the data_dict, to pass in prepare_data (defined in the parent class of dataset.py) is processed uniformly, and then it can be returned.
The front includes the calibration and transfer of images and point clouds in the lidar coordinate system of some cameras. According to self prepare_ The data function enhances the data.
We can click in to see what has been done:
Self prepare_ The data function is in dataset Py file

            data_dict = self.data_augmentor.forward(
                data_dict={
                    **data_dict,
                    'gt_boxes_mask': gt_boxes_mask
                }
            )

Package the incoming data and throw it into the data enhancement function.
The forward function obtains the enhancer according to the configuration and name in the configuration file. The enhanced detailed operation is defined in data_ Inside the augmentor:

DATA_AUGMENTOR:
    DISABLE_AUG_LIST: ['placeholder']
    AUG_CONFIG_LIST:
        - NAME: gt_sampling
          USE_ROAD_PLANE: True
          DB_INFO_PATH:
              - kitti_dbinfos_train.pkl
          PREPARE: {
             filter_by_min_points: ['Car:5', 'Pedestrian:5', 'Cyclist:5'],
             filter_by_difficulty: [-1],
          }

          SAMPLE_GROUPS: ['Car:20','Pedestrian:15', 'Cyclist:15']
          NUM_POINT_FEATURES: 4
          DATABASE_WITH_FAKELIDAR: False
          REMOVE_EXTRA_WIDTH: [0.0, 0.0, 0.0]
          LIMIT_WHOLE_SCENE: True

        - NAME: random_world_flip
          ALONG_AXIS_LIST: ['x']

        - NAME: random_world_rotation
          WORLD_ROT_ANGLE: [-0.78539816, 0.78539816]

        - NAME: random_world_scaling
          WORLD_SCALE_RANGE: [0.95, 1.05]


Then filter the gt to be detected and the attributes to be used:

        # Filter items to be detected_ boxes
        if data_dict.get('gt_boxes', None) is not None:
            # Return data_dict[gt_names] exists in class_ Subscript of name (np.array)
            selected = common_utils.keep_arrays_by_name(data_dict['gt_names'], self.class_names)
            # Select the desired GT according to the selected_ Boxes and gt_names
            data_dict['gt_boxes'] = data_dict['gt_boxes'][selected]
            data_dict['gt_names'] = data_dict['gt_names'][selected]
            # Gt of frame data_ The category name in names corresponds to class_ Subscript of names
            # Take chestnuts, the category we want to detect_ names = 'car','person'
            # For the current frame, category gt_names = 'car', 'person', 'car', 'car', three cars and one person appear in the current frame. After obtaining the index, gt_classes = 1, 2, 1, 1
            gt_classes = np.array([self.class_names.index(n) + 1 for n in data_dict['gt_names']], dtype=np.int32)
            # Put the category index information into each GT_ The last of boxes
            gt_boxes = np.concatenate((data_dict['gt_boxes'], gt_classes.reshape(-1, 1).astype(np.float32)), axis=1)
            data_dict['gt_boxes'] = gt_boxes

            # If box2d is different, select the required box2d according to selected
            if data_dict.get('gt_boxes2d', None) is not None:
                data_dict['gt_boxes2d'] = data_dict['gt_boxes2d'][selected]

        # Which attributes of the point are used, such as x,y,z, and so on
        if data_dict.get('points', None) is not None:
            data_dict = self.point_feature_encoder.forward(data_dict)

Then preprocess the point cloud, including removing the point cloud beyond the range, disrupting the order of points and converting the point cloud into voxel
From this, we can see that dataset actually creates a class related to dataset, and preprocesses the data of this class at the same time.
Then we initialize the dataloader:

    dataloader = DataLoader(
        dataset, batch_size=batch_size, pin_memory=True, num_workers=workers,
        shuffle=(sampler is None) and training, collate_fn=dataset.collate_batch,
        drop_last=False, sampler=sampler, timeout=0
    )

Initialize the DataLoader. At this time, there is no data sampling and loading. It will be called according to batch size only during training__ getitem__ Load data
In the single card training, the data is loaded through the DataLoader
Return dataset, dataloader and sampler

Tags: Deep Learning

Posted by FRSH on Thu, 14 Apr 2022 16:36:14 +0930