文章目录
pointpillar相关的其它文章链接如下:
- 【论文阅读】CVPR 2019| PointPillars: 基于点云的快速编码目标检测框架(Fast Encoders for Object Detection from Point Clouds)
- OpenPCDet v0.5版本的安装与测试
- openpcdet之pointpillar代码阅读——第一篇:数据增强与数据处理
- openpcdet之pointpillar代码阅读——第二篇:网络结构
- openpcdet之pointpillar代码阅读——第三篇:损失函数的计算
上一篇文章,我们梳理了数据增强和数据处理,并且得到了相应的pillar数据。下面我们继续讲pointpillar中的网络结构。
整体网络结构如下:
; 1. VFE
功能:这部分是简化版的pointnet网络,将经过数据增强和数据处理过后的pillar(N,4)数据,经过BN层、Relu激活层和max pool层得到(C, H, W)数据
。
在VFE之前的data_dict的数据如下所示:
'''
batch_dict:
points:(N,5) --> (batch_index,x,y,z,r) batch_index代表了该点云数据在当前batch中的index
frame_id:(batch_size,) -->帧ID-->我们存放的是npy的绝对地址,batch_size个地址
gt_boxes:(batch_size,N,8)--> (x,y,z,dx,dy,dz,ry,class),
use_lead_xyz:(batch_size,) --> (1,1,1,1),batch_size个1
voxels:(M,32,4) --> (x,y,z,r)
voxel_coords:(M,4) --> (batch_index,z,y,x) batch_index代表了该点云数据在当前batch中的index
voxel_num_points:(M,):每个voxel内的点云
batch_size:batch_size大小
'''
随后经过VFE之后,就可以把原始的点云结构( N ∗ 4 ) (N4)(N ∗4 )变换成了( D , P , N ) (D,P,N)(D ,P ,N ),其中 D代表了每个点云的特征维度,也就是每个点云10个特征(论文中只有9维),P代表了所有非空的立方柱体,N代表了每个pillar中最多会有多少个点
*。具体操作以及说明如下:
- D ( x , y , z , x c , r , y c , z c , x p , y p , z p ) D(x,y,z, x_c ,r, y_c , z_c , x_p ,y_p,z_p)D (x ,y ,z ,x c ,r ,y c ,z c ,x p ,y p ,z p ):xyz表示点云的真实坐标,下标c代表了每个点云到该点所对应pillar中所有点平均值的偏移量,下标p表示该点距离所在pillar中心点的偏移量。
- P:代表了所有非空的立方柱体,yaml配置中有最大值MAX_NUMBER_OF_VOXELS。
- N:代表了每个pillar中最多会有多少个点,实际操作取32。
得到( D , P , N ) (D,P,N)(D ,P ,N )的张量后,接下来这里使用了一个简化版的pointnet网络对点云的数据进行特征提取(即将这些点通过MLP升维,然后跟着BN层和Relu激活层),得到一个( C , P , N ) (C,P,N)(C ,P ,N )形状的张量,之后再使用max pooling操作提取每个pillar中最能代表该pillar的点。那么输出会变成( C , P , N ) − > ( C , P ) − > ( C , H , W ) (C,P,N)->(C,P)->(C, H, W)(C ,P ,N )−>(C ,P )−>(C ,H ,W )
这部分代码在: pcdet/models/backbones_3d/vfe/pillar_vfe.py
,具体的注释代码如下:
import torch
import torch.nn as nn
import torch.nn.functional as F
from .vfe_template import VFETemplate
class PFNLayer(nn.Module):
def __init__(self,
in_channels,
out_channels,
use_norm=True,
last_layer=False):
super().__init__()
self.last_vfe = last_layer
self.use_norm = use_norm
if not self.last_vfe:
out_channels = out_channels // 2
if self.use_norm:
self.linear = nn.Linear(in_channels, out_channels, bias=False)
self.norm = nn.BatchNorm1d(out_channels, eps=1e-3, momentum=0.01)
else:
self.linear = nn.Linear(in_channels, out_channels, bias=True)
self.part = 50000
def forward(self, inputs):
if inputs.shape[0] > self.part:
num_parts = inputs.shape[0] // self.part
part_linear_out = [self.linear(inputs[num_part*self.part:(num_part+1)*self.part])
for num_part in range(num_parts+1)]
x = torch.cat(part_linear_out, dim=0)
else:
x = self.linear(inputs)
torch.backends.cudnn.enabled = False
x = self.norm(x.permute(0, 2, 1)).permute(0, 2, 1) if self.use_norm else x
torch.backends.cudnn.enabled = True
x = F.relu(x)
x_max = torch.max(x, dim=1, keepdim=True)[0]
if self.last_vfe:
return x_max
else:
x_repeat = x_max.repeat(1, inputs.shape[1], 1)
x_concatenated = torch.cat([x, x_repeat], dim=2)
return x_concatenated
class PillarVFE(VFETemplate):
def __init__(self, model_cfg, num_point_features, voxel_size, point_cloud_range, **kwargs):
super().__init__(model_cfg=model_cfg)
self.use_norm = self.model_cfg.USE_NORM
self.with_distance = self.model_cfg.WITH_DISTANCE
self.use_absolute_xyz = self.model_cfg.USE_ABSLOTE_XYZ
num_point_features += 6 if self.use_absolute_xyz else 3
if self.with_distance:
num_point_features += 1
self.num_filters = self.model_cfg.NUM_FILTERS
assert len(self.num_filters) > 0
num_filters = [num_point_features] + list(self.num_filters)
pfn_layers = []
for i in range(len(num_filters) - 1):
in_filters = num_filters[i]
out_filters = num_filters[i + 1]
pfn_layers.append(
PFNLayer(in_filters, out_filters, self.use_norm, last_layer=(i >= len(num_filters) - 2))
)
self.pfn_layers = nn.ModuleList(pfn_layers)
self.voxel_x = voxel_size[0]
self.voxel_y = voxel_size[1]
self.voxel_z = voxel_size[2]
self.x_offset = self.voxel_x / 2 + point_cloud_range[0]
self.y_offset = self.voxel_y / 2 + point_cloud_range[1]
self.z_offset = self.voxel_z / 2 + point_cloud_range[2]
def get_output_feature_dim(self):
return self.num_filters[-1]
def get_paddings_indicator(self, actual_num, max_num, axis=0):
'''
指出一个pillar中哪些是真实数据,哪些是填充的0数据
'''
actual_num = torch.unsqueeze(actual_num, axis + 1)
max_num_shape = [1] * len(actual_num.shape)
max_num_shape[axis + 1] = -1
max_num = torch.arange(max_num, dtype=torch.int, device=actual_num.device).view(max_num_shape)
paddings_indicator = actual_num.int() > max_num
return paddings_indicator
def forward(self, batch_dict, **kwargs):
'''
batch_dict:
points:(N,5) --> (batch_index,x,y,z,r) batch_index代表了该点云数据在当前batch中的index
frame_id:(batch_size,) -->帧ID-->我们存放的是npy的绝对地址,batch_size个地址
gt_boxes:(batch_size,N,8)--> (x,y,z,dx,dy,dz,ry,class),
use_lead_xyz:(batch_size,) --> (1,1,1,1),batch_size个1
voxels:(M,32,4) --> (x,y,z,r)
voxel_coords:(M,4) --> (batch_index,z,y,x) batch_index代表了该点云数据在当前batch中的index
voxel_num_points:(M,):每个voxel内的点云
batch_size:4:batch_size大小
'''
voxel_features, voxel_num_points, coords = batch_dict['voxels'], batch_dict['voxel_num_points'], batch_dict['voxel_coords']
points_mean = voxel_features[:, :, :3].sum(dim=1, keepdim=True) / voxel_num_points.type_as(voxel_features).view(-1, 1, 1)
f_cluster = voxel_features[:, :, :3] - points_mean
f_center = torch.zeros_like(voxel_features[:, :, :3])
'''
coords是每个网格点的坐标,即[432, 496, 1],需要乘以每个pillar的长宽得到点云数据中实际的长宽(单位米)
同时为了获得每个pillar的中心点坐标,还需要加上每个pillar长宽的一半得到中心点坐标
每个点的x、y、z减去对应pillar的坐标中心点,得到每个点到该点中心点的偏移量
'''
f_center[:, :, 0] = voxel_features[:, :, 0] - (coords[:, 3].to(voxel_features.dtype).unsqueeze(1) * self.voxel_x + self.x_offset)
f_center[:, :, 1] = voxel_features[:, :, 1] - (coords[:, 2].to(voxel_features.dtype).unsqueeze(1) * self.voxel_y + self.y_offset)
f_center[:, :, 2] = voxel_features[:, :, 2] - (coords[:, 1].to(voxel_features.dtype).unsqueeze(1) * self.voxel_z + self.z_offset)
if self.use_absolute_xyz:
features = [voxel_features, f_cluster, f_center]
else:
features = [voxel_features[..., 3:], f_cluster, f_center]
if self.with_distance:
points_dist = torch.norm(voxel_features[:, :, :3], 2, 2, keepdim=True)
features.append(points_dist)
features = torch.cat(features, dim=-1)
voxel_count = features.shape[1]
mask = self.get_paddings_indicator(voxel_num_points, voxel_count, axis=0)
mask = torch.unsqueeze(mask, -1).type_as(voxel_features)
features *= mask
for pfn in self.pfn_layers:
features = pfn(features)
features = features.squeeze()
batch_dict['pillar_features'] = features
return batch_dict
- MAP_TO_BEV
功能:将得到的pillar数据,投影至二维坐标中
。
在经过简化版的pointnet网络提取出每个pillar的特征信息后,就需要将每个的pillar数据重新放回原来的坐标中,也就是二维坐标,组成 伪图像
数据。
对应到论文中就是stacked pillars,将生成的pillar按照坐标索引还原到原空间中。
这部分代码在: pcdet/models/backbones_2d/map_to_bev/pointpillar_scatter.py
,具体的注释代码如下:
import torch
import torch.nn as nn
class PointPillarScatter(nn.Module):
def __init__(self, model_cfg, grid_size, **kwargs):
super().__init__()
self.model_cfg = model_cfg
self.num_bev_features = self.model_cfg.NUM_BEV_FEATURES
self.nx, self.ny, self.nz = grid_size
assert self.nz == 1
def forward(self, batch_dict, **kwargs):
'''
batch_dict['pillar_features']-->为VFE得到的数据(M, 64)
voxel_coords:(M,4) --> (batch_index,z,y,x) batch_index代表了该点云数据在当前batch中的index
'''
pillar_features, coords = batch_dict['pillar_features'], batch_dict['voxel_coords']
batch_spatial_features = []
batch_size = coords[:, 0].max().int().item() + 1
for batch_idx in range(batch_size):
spatial_feature = torch.zeros(
self.num_bev_features,
self.nz * self.nx * self.ny,
dtype=pillar_features.dtype,
device=pillar_features.device)
batch_mask = coords[:, 0] == batch_idx
this_coords = coords[batch_mask, :]
indices = this_coords[:, 1] + this_coords[:, 2] * self.nx + this_coords[:, 3]
indices = indices.type(torch.long)
pillars = pillar_features[batch_mask, :]
pillars = pillars.t()
spatial_feature[:, indices] = pillars
batch_spatial_features.append(spatial_feature)
batch_spatial_features = torch.stack(batch_spatial_features, 0)
batch_spatial_features = batch_spatial_features.view(batch_size, self.num_bev_features * self.nz, self.ny, self.nx)
batch_dict['spatial_features'] = batch_spatial_features
return batch_dict
- BACKBONE_2D
功能:骨干网络,提取特征
经过上面的映射操作,将原来的pillar提取最大的数值后放回到相应的坐标后,就可以得到类似于图像的数据了;只有在有pillar非空的坐标处有提取的点云数据,其余地方都是0数据,所以得到的一个(batch_size,64, 432, 496)的张量还是很稀疏的。
BACKBONE_2D的输入特征维度(batch_size,64, 432, 496),输出的特征维度为[batch_size, 384, 248, 216]。
需要说明的是,主干网络构建了下采样和上采样网络,分别为加入到了blocks和deblocks中,上采样和下采样的具体操作可查看下列代码和注释。
这部分代码在: pcdet/models/backbones_2d/base_bev_backbone.py
,具体的注释代码如下:
import numpy as np
import torch
import torch.nn as nn
class BaseBEVBackbone(nn.Module):
def __init__(self, model_cfg, input_channels):
super().__init__()
self.model_cfg = model_cfg
if self.model_cfg.get('LAYER_NUMS', None) is not None:
assert len(self.model_cfg.LAYER_NUMS) == len(self.model_cfg.LAYER_STRIDES) == len(self.model_cfg.NUM_FILTERS)
layer_nums = self.model_cfg.LAYER_NUMS
layer_strides = self.model_cfg.LAYER_STRIDES
num_filters = self.model_cfg.NUM_FILTERS
else:
layer_nums = layer_strides = num_filters = []
if self.model_cfg.get('UPSAMPLE_STRIDES', None) is not None:
assert len(self.model_cfg.UPSAMPLE_STRIDES) == len(self.model_cfg.NUM_UPSAMPLE_FILTERS)
num_upsample_filters = self.model_cfg.NUM_UPSAMPLE_FILTERS
upsample_strides = self.model_cfg.UPSAMPLE_STRIDES
else:
upsample_strides = num_upsample_filters = []
num_levels = len(layer_nums)
c_in_list = [input_channels, *num_filters[:-1]]
self.blocks = nn.ModuleList()
self.deblocks = nn.ModuleList()
for idx in range(num_levels):
cur_layers = [
nn.ZeroPad2d(1),
nn.Conv2d(
c_in_list[idx], num_filters[idx], kernel_size=3,
stride=layer_strides[idx], padding=0, bias=False
),
nn.BatchNorm2d(num_filters[idx], eps=1e-3, momentum=0.01),
nn.ReLU()
]
for k in range(layer_nums[idx]):
cur_layers.extend([
nn.Conv2d(num_filters[idx], num_filters[idx], kernel_size=3, padding=1, bias=False),
nn.BatchNorm2d(num_filters[idx], eps=1e-3, momentum=0.01),
nn.ReLU()
])
self.blocks.append(nn.Sequential(*cur_layers))
if len(upsample_strides) > 0:
stride = upsample_strides[idx]
if stride >= 1:
self.deblocks.append(nn.Sequential(
nn.ConvTranspose2d(
num_filters[idx], num_upsample_filters[idx],
upsample_strides[idx],
stride=upsample_strides[idx], bias=False
),
nn.BatchNorm2d(num_upsample_filters[idx], eps=1e-3, momentum=0.01),
nn.ReLU()
))
else:
stride = np.round(1 / stride).astype(np.int)
self.deblocks.append(nn.Sequential(
nn.Conv2d(
num_filters[idx], num_upsample_filters[idx],
stride,
stride=stride, bias=False
),
nn.BatchNorm2d(num_upsample_filters[idx], eps=1e-3, momentum=0.01),
nn.ReLU()
))
c_in = sum(num_upsample_filters)
if len(upsample_strides) > num_levels:
self.deblocks.append(nn.Sequential(
nn.ConvTranspose2d(c_in, c_in, upsample_strides[-1], stride=upsample_strides[-1], bias=False),
nn.BatchNorm2d(c_in, eps=1e-3, momentum=0.01),
nn.ReLU(),
))
self.num_bev_features = c_in
def forward(self, data_dict):
"""
Args:
data_dict:
spatial_features:(batch_size, 64, 496, 432)
Returns:
"""
spatial_features = data_dict['spatial_features']
ups = []
ret_dict = {}
x = spatial_features
for i in range(len(self.blocks)):
x = self.blocks[i](x)
stride = int(spatial_features.shape[2] / x.shape[2])
ret_dict['spatial_features_%dx' % stride] = x
if len(self.deblocks) > 0:
ups.append(self.deblocks[i](x))
else:
ups.append(x)
if len(ups) > 1:
x = torch.cat(ups, dim=1)
elif len(ups) == 1:
x = ups[0]
if len(self.deblocks) > len(self.blocks):
x = self.deblocks[-1](x)
data_dict['spatial_features_2d'] = x
return data_dict
- DENSE_HEAD
功能:
一共有三个类别的先验框,每个先验框都有 两个方向,分别是BEV视角下的0度和90度
,每个类别的先验框只有一种尺度信息;分别是车 [3.9, 1.6, 1.56]、人[0.8, 0.6, 1.73]、自行车[1.76, 0.6, 1.73](单位:米)。其中Car的先验框如下所示:
{
'class_name': 'Car',
'anchor_sizes': [[3.9, 1.6, 1.56]],
'anchor_rotations': [0, 1.57],
'anchor_bottom_heights': [-1.78],
'align_center': False,
'feature_map_stride': 2,
'matched_threshold': 0.6,
'unmatched_threshold': 0.45
},
在anchor匹配GT的过程中,使用的是 2D IOU
匹配方式,直接从生成的特征图也就是BEV视角进行匹配,没有考虑高度的信息。
每个anchor都需要预测7个参数,分别是 ( x , y , z , w , l , h , θ , c l s ) (x, y, z, w, l, h, θ, cls)(x ,y ,z ,w ,l ,h ,θ,c l s )。
- x , y , z x, y, z x ,y ,z预测一个anchor的中心坐标在点云中的位置;
- w , l , h w,l,h w ,l ,h分别预测了一个anchor的长宽高数据;
- θ θθ预测了box的旋转角度。 同时,因为在角度预测时候很难区分两个完全相反的box,所以PiontPillars的检测头中还添加了对一个anchor的方向预测,这里使用了一个
基于softmax的方向分类box
的两个朝向信息。 - cls为预测的类别。
这部分代码在: pcdet/models/dense_heads/anchor_head_single.py
,具体的注释代码如下:
import numpy as np
import torch.nn as nn
from .anchor_head_template import AnchorHeadTemplate
class AnchorHeadSingle(AnchorHeadTemplate):
'''
Args:
model_cfg: AnchorHeadSingle的配置
input_channels: 384 输入通道数
num_class: 3
class_names: ['Car','Pedestrian','Cyclist']
grid_size: (432, 496, 1)
point_cloud_range: (0, -39.68, -3, 69.12, 39.68, 1)
predict_boxes_when_training: False
'''
def __init__(self, model_cfg, input_channels, num_class, class_names, grid_size, point_cloud_range,
predict_boxes_when_training=True, **kwargs):
super().__init__(
model_cfg=model_cfg, num_class=num_class, class_names=class_names, grid_size=grid_size, point_cloud_range=point_cloud_range,
predict_boxes_when_training=predict_boxes_when_training
)
self.num_anchors_per_location = sum(self.num_anchors_per_location)
self.conv_cls = nn.Conv2d(
input_channels, self.num_anchors_per_location * self.num_class,
kernel_size=1
)
self.conv_box = nn.Conv2d(
input_channels, self.num_anchors_per_location * self.box_coder.code_size,
kernel_size=1
)
if self.model_cfg.get('USE_DIRECTION_CLASSIFIER', None) is not None:
self.conv_dir_cls = nn.Conv2d(
input_channels,
self.num_anchors_per_location * self.model_cfg.NUM_DIR_BINS,
kernel_size=1
)
else:
self.conv_dir_cls = None
self.init_weights()
def init_weights(self):
pi = 0.01
nn.init.constant_(self.conv_cls.bias, -np.log((1 - pi) / pi))
nn.init.normal_(self.conv_box.weight, mean=0, std=0.001)
def forward(self, data_dict):
spatial_features_2d = data_dict['spatial_features_2d']
cls_preds = self.conv_cls(spatial_features_2d)
box_preds = self.conv_box(spatial_features_2d)
cls_preds = cls_preds.permute(0, 2, 3, 1).contiguous()
box_preds = box_preds.permute(0, 2, 3, 1).contiguous()
self.forward_ret_dict['cls_preds'] = cls_preds
self.forward_ret_dict['box_preds'] = box_preds
if self.conv_dir_cls is not None:
dir_cls_preds = self.conv_dir_cls(spatial_features_2d)
dir_cls_preds = dir_cls_preds.permute(0, 2, 3, 1).contiguous()
self.forward_ret_dict['dir_cls_preds'] = dir_cls_preds
else:
dir_cls_preds = None
if self.training:
targets_dict = self.assign_targets(
gt_boxes=data_dict['gt_boxes']
)
self.forward_ret_dict.update(targets_dict)
if not self.training or self.predict_boxes_when_training:
batch_cls_preds, batch_box_preds = self.generate_predicted_boxes(
batch_size=data_dict['batch_size'],
cls_preds=cls_preds, box_preds=box_preds, dir_cls_preds=dir_cls_preds
)
data_dict['batch_cls_preds'] = batch_cls_preds
data_dict['batch_box_preds'] = batch_box_preds
data_dict['cls_preds_normalized'] = False
return data_dict
Original: https://blog.csdn.net/QLeelq/article/details/117328660
Author: 非晚非晚
Title: openpcdet之pointpillar代码阅读——第二篇:网络结构
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/680958/
转载文章受原作者版权保护。转载请注明原作者出处!