Recently, I'm looking at the source code of pvrcnn to deepen my understanding of the paper. The understanding of the source code is thanks to the boss notes , thank you!
First we see train Py function, preprocessing the data before training:
train_set, train_loader, train_sampler = build_dataloader( dataset_cfg=cfg.DATA_CONFIG, class_names=cfg.CLASS_NAMES, batch_size=args.batch_size, dist=dist_train, workers=args.workers, logger=logger, training=True, merge_all_iters_to_one_epoch=args.merge_all_iters_to_one_epoch, total_epochs=args.epochs )
cfg and arg are the configuration files and variables defined above.
Let's go in and check the build_ What does the dataloader function do.
build_dataloader is defined in_ init_.py, first we select dataset in the configuration file to initialize the dataset:
dataset = __all__[dataset_cfg.DATASET]( dataset_cfg=dataset_cfg, class_names=class_names, root_path=root_path, training=training, logger=logger, )
__all__ = { 'DatasetTemplate': DatasetTemplate, 'KittiDataset': KittiDataset, 'NuScenesDataset': NuScenesDataset, 'WaymoDataset': WaymoDataset }
The all function mainly selects the data set we want to preprocess through the configuration file: kittiDataset
KittiDataset is defined in kitti_dataset.py, we first initialize it:
def __init__(self, dataset_cfg, class_names, training=True, root_path=None, logger=None): """ Args: root_path: dataset_cfg: class_names: training: logger: """ # Initialize the class and assign parameters to the properties of the class super().__init__( dataset_cfg=dataset_cfg, class_names=class_names, training=training, root_path=root_path, logger=logger ) # Is the transfer parameter a training set train or a verification set val self.split = self.dataset_cfg.DATA_SPLIT[self.mode] # root_ The path of path is/ data/kitti/ # The kitti dataset consists of three folders: "training", "testing" and "ImageSets" # If it is a training set train, refer to the path of the file as training set, otherwise it is testing set self.root_split_path = self.root_path / ('training' if self.split != 'test' else 'testing') # /data/kitti/ImageSets / there are three files in total: test txt , train. txt ,val.txt # Select one of the files split_dir = self.root_path / 'ImageSets' / (self.split + '.txt') # Get txt file to form the list sample_id_list self.sample_id_list = [x.strip() for x in open(split_dir).readlines()] if split_dir.exists() else None # Create an empty list for storing kitti information self.kitti_infos = [] # Call the function to load kitti data. The value of mode is train or test self.include_kitti_data(self.mode)
It is mainly based on the txt file in the data set to get the sample_id_list and create a Kitti_ The empty dictionary of infos is used to store information later.
We ignore some auxiliary functions and see directly__ getitem__ Function, which mainly reads the data set information in the generated pkl file, then determines the frame number according to info ['point_cloud'] ['lidar_idx'], reads the data and other info fields, and preliminarily reads the data_dict, to pass in prepare_data (defined in the parent class of dataset.py) is processed uniformly, and then it can be returned.
The front includes the calibration and transfer of images and point clouds in the lidar coordinate system of some cameras. According to self prepare_ The data function enhances the data.
We can click in to see what has been done:
Self prepare_ The data function is in dataset Py file
data_dict = self.data_augmentor.forward( data_dict={ **data_dict, 'gt_boxes_mask': gt_boxes_mask } )
Package the incoming data and throw it into the data enhancement function.
The forward function obtains the enhancer according to the configuration and name in the configuration file. The enhanced detailed operation is defined in data_ Inside the augmentor:
DATA_AUGMENTOR: DISABLE_AUG_LIST: ['placeholder'] AUG_CONFIG_LIST: - NAME: gt_sampling USE_ROAD_PLANE: True DB_INFO_PATH: - kitti_dbinfos_train.pkl PREPARE: { filter_by_min_points: ['Car:5', 'Pedestrian:5', 'Cyclist:5'], filter_by_difficulty: [-1], } SAMPLE_GROUPS: ['Car:20','Pedestrian:15', 'Cyclist:15'] NUM_POINT_FEATURES: 4 DATABASE_WITH_FAKELIDAR: False REMOVE_EXTRA_WIDTH: [0.0, 0.0, 0.0] LIMIT_WHOLE_SCENE: True - NAME: random_world_flip ALONG_AXIS_LIST: ['x'] - NAME: random_world_rotation WORLD_ROT_ANGLE: [-0.78539816, 0.78539816] - NAME: random_world_scaling WORLD_SCALE_RANGE: [0.95, 1.05]
Then filter the gt to be detected and the attributes to be used:
# Filter items to be detected_ boxes if data_dict.get('gt_boxes', None) is not None: # Return data_dict[gt_names] exists in class_ Subscript of name (np.array) selected = common_utils.keep_arrays_by_name(data_dict['gt_names'], self.class_names) # Select the desired GT according to the selected_ Boxes and gt_names data_dict['gt_boxes'] = data_dict['gt_boxes'][selected] data_dict['gt_names'] = data_dict['gt_names'][selected] # Gt of frame data_ The category name in names corresponds to class_ Subscript of names # Take chestnuts, the category we want to detect_ names = 'car','person' # For the current frame, category gt_names = 'car', 'person', 'car', 'car', three cars and one person appear in the current frame. After obtaining the index, gt_classes = 1, 2, 1, 1 gt_classes = np.array([self.class_names.index(n) + 1 for n in data_dict['gt_names']], dtype=np.int32) # Put the category index information into each GT_ The last of boxes gt_boxes = np.concatenate((data_dict['gt_boxes'], gt_classes.reshape(-1, 1).astype(np.float32)), axis=1) data_dict['gt_boxes'] = gt_boxes # If box2d is different, select the required box2d according to selected if data_dict.get('gt_boxes2d', None) is not None: data_dict['gt_boxes2d'] = data_dict['gt_boxes2d'][selected] # Which attributes of the point are used, such as x,y,z, and so on if data_dict.get('points', None) is not None: data_dict = self.point_feature_encoder.forward(data_dict)
Then preprocess the point cloud, including removing the point cloud beyond the range, disrupting the order of points and converting the point cloud into voxel
From this, we can see that dataset actually creates a class related to dataset, and preprocesses the data of this class at the same time.
Then we initialize the dataloader:
dataloader = DataLoader( dataset, batch_size=batch_size, pin_memory=True, num_workers=workers, shuffle=(sampler is None) and training, collate_fn=dataset.collate_batch, drop_last=False, sampler=sampler, timeout=0 )
Initialize the DataLoader. At this time, there is no data sampling and loading. It will be called according to batch size only during training__ getitem__ Load data
In the single card training, the data is loaded through the DataLoader
Return dataset, dataloader and sampler