目标检测学习笔记——mmdetection

2023年7月9日上午10:40 • 人工智能 • 阅读 112

将新的数据集格式转化成COCO数据集格式以指定的目录形式放在指定的目录下，要修改mmdetection/mmdet/datasets/coco.py里的CLASSES和config文件里model字典的的num_classes和data字典的img_scale(输入图像尺寸的最大边和最小边）和optimizer中的lr，接着在mmdetection/mmdet/core/evaluation/class_names.py修改coco_classes数据集类别，这个关系到后面test的时候结果图中显示的类别名称。
将新的数据集转化成中间格式
构建一个新的Dataset继承CustomDataset

如采用第二种方法要将annotations转化成MMDetection接受的中间形式，如下图：

这时就可以用原来的 load_annotations函数，但需要自己写一个继承CustomDataset的Dataset类，如下所示：（文件在mmdet/datasets/*.py)，这时就不用修改mmdetection/mmdet/datasets/coco.py里的CLASSES了，因为用自己的新建Dataset类了

再修改config文件，可以自己写一个属于自己的config文件，注意要修改文件路径。

实测用自己的数据集训练并预测成功
首先在mmdet/datasets/文件夹下新建一个自己的数据集py文件，然后可以直接复制仿写coco.py文件，类名要改成自己的数据集，CLASSES修改为自己的类别注意一个类别要加逗号，这样才是元组数据类型；接着要在mmdet/dataset/ init.py文件的__all__添加自己的数据集名，还要加上from .MyDataset(自己的数据集的py文件名） import MyDataset(自己的数据集类名），最后要记得重新安装mmdet才能将自己的数据集注册进DATASETS,可以运行 pip install -v -e .，不要忘了.，还要记得修改config文件里的num_classes和classes，并不用修改网上所说的mmdet/core/evaluation/class_names.py文件，也可以预测出自己的类别。
我在kaggle上成功用下面的方法也添加了自己的数据集，这种方法就是注册了一个继承CoCoDataset类的子类，并重新CLASSES，但在其他地方却显示未注册。

@DATASETS.register_module()
class UnderwaterDataset(CocoDataset):
    CLASSES = set(['starfish'])

三、错误记录

1、ValueError: need at least one array to concatenate
出现这种错误可以用configs/cascade_rcnn/underwater.py代码验证，如果没有读取到肯定是数据集的问题，我出现问题的原因是数据集json文件的categories的name与config文件和Dataset的CLASSES里的不一样。

2、AttributeError: ‘ConfigDict’ object has no attribute ‘train_cfg’ during custom dataset training
链接：https://github.com/open-mmlab/mmdetection/issues/4603
用 model = build_detector( cfg.model, train_cfg=cfg.get('train_cfg'), test_cfg=cfg.get('test_cfg'))取代 model = build_detector( cfg.model, train_cfg=cfg.train_cfg, test_cfg=cfg.test_cfg)

3、KeyError: ‘xxx is not in the dataset registry’
链接：https://github.com/open-mmlab/mmdetection/issues/3751
在mmdet/datasets/下新建自己数据集的py文件，可以仿照CoCo数据集写法， @DATASETS.register_module
class MyDataset xxxx
再在__init__.py文件里
from .my_dataset import MyDataset
然后再在__all__加上自己的数据集
pip install -v -e .
4、has no key: init_cfg

5、TypeError: CascadeRCNN: ResNet: init () got an unexpected keyword argument ‘groups’
错误原因是resnet没有groups这个参数，是因为我将ResneXt写成了Resnet

6、TypeError: CascadeRCNN: SwinTransformer: init () got an unexpected keyword argument ‘ delete ‘
错误原因：If your config does not inherit a config, do not add delete.

四、多尺度训练

The term of “multiscale training” is adopted in many papers, which indicates resizing images to different scales at each iteration. In the challenge, we use the setting [(400, 1600), (1400, 1600)] which means the short edge are randomly sampled from 400~1400, and the long edge is fixed as 1600.

由源码可知有两种模式，range和value,range模式只能有两种尺度，小边从两个尺度小边的范围内去，大边在以两个尺度大边为范围随机取一个整数；value模式可以有任意多个输入尺度，随机挑选一个尺度作为resize的尺度

; 五、GC

https://zhuanlan.zhihu.com/p/102817180

六、代码解读

官方文档：

https://mmdetection.readthedocs.io/zh_CN/latest/tutorials/config.html

1、config文件

参考链接：目标检测比赛中的tricks：https://zhuanlan.zhihu.com/p/102817180
MMDetection 中常用算法详解：Faster R-CNN：https://zhuanlan.zhihu.com/p/422456194

model settings
model = dict(
    type='CascadeRCNN',#&#x7B97;&#x6CD5;&#x540D;&#x79F0;
    backbone=dict(
        type='ResNeXt',#&#x7279;&#x5F81;&#x63D0;&#x53D6;&#x7F51;&#x7EDC;&#x7C7B;&#x578B;
        depth=101,#&#x6DF1;&#x5EA6;
        num_stages=4,#resnet&#x7684;stage&#x6570;&#xFF0C;&#x4E00;&#x822C;&#x90FD;&#x662F;4&#x4E2A;&#xFF0C;resnet&#x5305;&#x62EC;&#x4E00;&#x4E2A;stem&#x548C;4&#x4E2A;stage&#x8F93;&#x51FA;
        out_indices=(0, 1, 2, 3),#&#x8F93;&#x51FA;&#x7684;&#x7279;&#x5F81;&#x56FE;&#x7D22;&#x5F15;&#xFF0C;&#x8868;&#x793A;&#x56DB;&#x4E2A;stage&#x8F93;&#x51FA;&#x90FD;&#x9700;&#x8981;
        # &#x8868;&#x793A;&#x672C;&#x6A21;&#x5757;&#x8F93;&#x51FA;&#x7684;&#x7279;&#x5F81;&#x56FE;&#x7D22;&#x5F15;&#xFF0C;(0, 1, 2, 3),&#x8868;&#x793A;4&#x4E2A; stage &#x8F93;&#x51FA;&#x90FD;&#x9700;&#x8981;&#xFF0C;
        # &#x5176; stride &#x4E3A; (4,8,16,32)&#xFF0C;channel &#x4E3A; (256, 512, 1024, 2048)
        frozen_stages=1,#&#x51BB;&#x7ED3;&#x7684;stage&#x6570;&#xFF0C;-1&#x8868;&#x793A;&#x4E0D;&#x51BB;&#x7ED3;&#xFF0C;0&#x8868;&#x793A;&#x51BB;&#x7ED3;stage0,1&#x8868;&#x793A;&#x51BB;&#x7ED3;0&#xFF0C;1&#xFF0C;&#x4EE5;&#x6B64;&#x7C7B;&#x63A8;
        norm_cfg=dict(type='BN', requires_grad=True),#&#x91C7;&#x7528;&#x7684;&#x5F52;&#x4E00;&#x5316;&#x7B97;&#x5B50;&#xFF08;&#x4E00;&#x822C;&#x662F;BN&#x6216;&#x8005;GN&#xFF09;&#xFF0C;&#x8FD9;&#x91CC;&#x91C7;&#x7528;BN&#xFF0C;&#x4E14;&#x68AF;&#x5EA6;&#x66F4;&#x65B0;
        norm_eval=True,#backbone&#x7684;&#x6240;&#x6709;BN&#x5C42;&#x90FD;&#x91C7;&#x7528;eval&#x6A21;&#x5F0F;&#xFF0C;&#x5373;&#x5747;&#x503C;&#x548C;&#x65B9;&#x5DEE;&#x90FD;&#x76F4;&#x63A5;&#x91C7;&#x7528;&#x5168;&#x5C40;&#x9884;&#x8BAD;&#x7EC3;&#x503C;&#xFF0C;&#x4E0D;&#x8FDB;&#x884C;&#x66F4;&#x65B0;
        style='pytorch',
        init_cfg=dict(type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'),
        groups=64,
        base_width=4),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],# resnet&#x6A21;&#x5757;&#x56DB;&#x4E2A;stage&#x7684;&#x56DB;&#x4E2A;&#x5C3A;&#x5EA6;&#x8F93;&#x51FA;feature map&#x7684;&#x7684;&#x901A;&#x9053;&#x6570;
        out_channels=256,# fpn&#x8F93;&#x51FA;&#x7684;&#x6BCF;&#x4E2A;&#x5C3A;&#x5EA6;&#x7279;&#x5F81;&#x56FE;&#x7684;&#x901A;&#x9053;&#x6570;
        num_outs=5),# fpn&#x8F93;&#x51FA;&#x7684;&#x7279;&#x5F81;&#x56FE;&#x4E2A;&#x6570;
        # FPN &#x6A21;&#x5757;&#x5B9E;&#x73B0;&#x4E86;c2 ~ c5 4 &#x4E2A;&#x7279;&#x5F81;&#x56FE;&#x8F93;&#x5165;&#xFF0C;p2 ~ p6 5&#x4E2A;&#x7279;&#x5F81;&#x56FE;&#x8F93;&#x51FA;&#xFF0C;&#x5176; strides = (4,8,16,32,64)&#x3002;&#x5176;&#x4E2D;p6&#x662F;m5&#x4E0A;&#x91C7;&#x6837;&#x5F97;&#x6765;&#x7684;&#x4E00;&#x4E2A;&#x5927;&#x611F;&#x53D7;&#x91CE;&#x7279;&#x5F81;&#x56FE;&#x4EE5;&#x68C0;&#x6D4B;&#x8D85;&#x5927;&#x7269;&#x4F53;&#x3002;
    rpn_head=dict(
        type='RPNHead',# rpn&#x7F51;&#x7EDC;&#x6A21;&#x578B;
        in_channels=256,# rpn&#x7F51;&#x7EDC;&#x7684;&#x8F93;&#x5165;&#x901A;&#x9053;&#x6570;
        feat_channels=256,# &#x4E2D;&#x95F4;&#x7279;&#x5F81;&#x56FE;&#x901A;&#x9053;&#x6570;
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],#&#x51B3;&#x5B9A;anchor&#x5927;&#x5C0F;&#xFF0C;&#x4E0D;&#x540C;&#x5C3A;&#x5EA6;&#x7279;&#x5F81;&#x56FE;&#x7684;&#x951A;&#x70B9;&#x9762;&#x79EF;&#x4E3A;base_size =  scale * stride
            ratios=[0.5, 1.0, 2.0],#&#x6BCF;&#x4E2A;feature map&#x751F;&#x6210;&#x7684;anchor&#x5BBD;&#x9AD8;&#x6BD4;&#xFF0C;&#x53EF;&#x4FEE;&#x6539;&#x4F18;&#x5316;
            strides=[4, 8, 16, 32, 64]),#&#x6BCF;&#x4E2A;&#x5C3A;&#x5EA6;&#x7684;feature map&#x751F;&#x6210;anchor&#x5BF9;&#x4E8E;&#x539F;&#x56FE;&#x7684;&#x6B65;&#x957F;&#xFF0C;&#x5BF9;&#x5E94;&#x4E8E;stage&#x7684;stride,# &#x951A;&#x751F;&#x6210;&#x5668;&#x7684;&#x6B65;&#x5E45;&#x3002;&#x8FD9;&#x4E0E; FPN &#x7279;&#x5F81;&#x6B65;&#x5E45;&#x4E00;&#x81F4;&#x3002; &#x5982;&#x679C;&#x672A;&#x8BBE;&#x7F6E; base_sizes&#xFF0C;&#x5219;&#x5F53;&#x524D;&#x6B65;&#x5E45;&#x503C;&#x5C06;&#x88AB;&#x89C6;&#x4E3A; base_sizes&#x3002;
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',#&#x89C1;&#x4E0B;&#x9762;
            target_means=[.0, .0, .0, .0],# &#x8F93;&#x51FA;&#x7684;&#x56DB;&#x4E2A;target&#x503C;&#x51CF;&#x53BB;&#x5747;&#x503C;&#x9664;&#x4EE5;&#x6807;&#x51C6;&#x5DEE;
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)),
    roi_head=dict(
        type='CascadeRoIHead',
        num_stages=3,
        stage_loss_weights=[1, 0.5, 0.25],
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),# ROI&#x7C7B;&#x578B;&#x4E3A;ROIAlign,&#x8F93;&#x51FA;&#x5C3A;&#x5BF8;&#x4E3A;7&#xFF0C;&#x8FDE;&#x63A5;rpn&#x548C;rcnn
            out_channels=256,# &#x8F93;&#x5165;shape&#x4E3A;&#xFF08;batch,nms_post,4)&#x8F93;&#x51FA;&#x4E3A;&#xFF08;batch,nms_post,256,roi_feat_size,roi_feat_size),256&#x4E3A;fpn&#x5C42;&#x8F93;&#x51FA;&#x7684;&#x7279;&#x5F81;&#x56FE;&#x901A;&#x9053;&#x5927;&#x5C0F;
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=[
            dict(
                type='Shared2FCBBoxHead',# &#x4E24;&#x4E2A;&#x5171;&#x4EAB;&#x7684;FC&#x6A21;&#x5757;
                in_channels=256,# &#x8F93;&#x5165;&#x901A;&#x9053;&#x6570;&#xFF0C;&#x76F8;&#x5F53;&#x4E8E;FPN&#x8F93;&#x51FA;&#x901A;&#x9053;&#x6570;
                fc_out_channels=1024,# &#x8F93;&#x51FA;&#x901A;&#x9053;&#x6570;&#xFF0C;&#x5E94;&#x7528;&#x4E24;&#x6B21;&#x5171;&#x4EAB;&#x5168;&#x8FDE;&#x63A5;&#x5C42;&#xFF0C;&#x8F93;&#x51FA;shape&#x4E3A;&#xFF08;batch*nms_post,1024)
                roi_feat_size=7,# ROIAlign&#x6216;&#x8005;ROIPool&#x8F93;&#x51FA;&#x7684;&#x7279;&#x5F81;&#x56FE;&#x5927;&#x5C0F;
                num_classes=1,&#xFF0C;&#x7C7B;&#x522B;&#x4E2A;&#x6570;
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.1, 0.1, 0.2, 0.2]),
                reg_class_agnostic=True,# &#x662F;&#x5426;&#x91C7;&#x7528;class_agnostic&#x7684;&#x65B9;&#x5F0F;&#x6765;&#x9884;&#x6D4B;&#xFF0C;true&#x65F6;&#x8868;&#x793A;&#x8F93;&#x51FA;bbox&#x53EA;&#x8003;&#x8651;&#x5176;&#x662F;&#x5426;&#x4E3A;&#x524D;&#x666F;&#xFF0C;&#x540E;&#x7EED;&#x5206;&#x7C7B;&#x7684;&#x65F6;&#x5019;&#x518D;&#x6839;&#x636E;&#x8BE5;bbox&#x5728;&#x7F51;&#x7EDC;&#x4E2D;&#x7684;&#x7C7B;&#x522B;&#x5F97;&#x5206;&#x6765;&#x5206;&#x7C7B;&#xFF0C;&#x4E5F;&#x5C31;&#x662F;&#x4E00;&#x4E2A;&#x6846;&#x53EF;&#x4EE5;&#x5BF9;&#x5E94;&#x591A;&#x4E2A;&#x7C7B;&#x522B;&#xFF0C;&#x5F71;&#x54CD;bbox&#x5206;&#x652F;&#x7684;&#x901A;&#x9053;&#x6570;&#xFF0C;True&#x8868;&#x793A;4&#x901A;&#x9053;&#x8F93;&#x51FA;,False&#x8868;&#x793A;4*num_classes&#x901A;&#x9053;&#x8F93;&#x51FA;
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
                               loss_weight=1.0)),
            dict(
                type='Shared2FCBBoxHead',
                in_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=1,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.05, 0.05, 0.1, 0.1]),
                reg_class_agnostic=True,
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
                               loss_weight=1.0)),
            dict(
                type='Shared2FCBBoxHead',
                in_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=1,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.033, 0.033, 0.067, 0.067]),
                reg_class_agnostic=True,
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))
        ]),
    # model training and testing settings
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,# &#x5206;&#x522B;&#x8BA1;&#x7B97;&#x6BCF;&#x4E2A;anchor&#x548C;&#x6240;&#x6709;gt&#x7684;iou,&#x5927;&#x4E8E;0.7&#x4E3A;&#x6B63;&#x6837;&#x672C;
                neg_iou_thr=0.3,# &#x5C0F;&#x4E8E;0.3&#x4E3A;&#x8D1F;&#x6837;&#x672C;
                min_pos_iou=0.3,# &#x8BA1;&#x7B97;&#x6BCF;&#x4E2A;gt&#x548C;&#x6240;&#x6709;anchor&#x7684;iou&#xFF0C;&#x6700;&#x5927;&#x7684;&#x5927;&#x4E8E;0.3&#x5219;&#x4FDD;&#x7559;&#x8FD9;&#x4E2A;anchor&#xFF0C;&#x8FD9;&#x4F1A;&#x751F;&#x6210;&#x4F4E;&#x8D28;&#x91CF;&#x7684;anchor,&#x6267;&#x4E0D;&#x6267;&#x884C;&#x53D6;&#x51B3;&#x4E8E;match_low_quality&#x3002;
                match_low_quality=True,
                ignore_iof_thr=-1),#
            sampler=dict(
                type='RandomSampler',
                num=256,# &#x9700;&#x8981;&#x63D0;&#x53D6;&#x7684;&#x6B63;&#x8D1F;&#x6837;&#x672C;&#x603B;&#x6570;
                pos_fraction=0.5,# &#x6B63;&#x6837;&#x672C;&#x6BD4;&#x4F8B;
                neg_pos_ub=-1,# &#x6B63;&#x8D1F;&#x6837;&#x672C;&#x6BD4;&#x4F8B;&#xFF0C;&#x7528;&#x4E8E;&#x786E;&#x5B9A;&#x8D1F;&#x6837;&#x672C;&#x91C7;&#x6837;&#x4E2A;&#x6570;&#x4E0A;&#x754C;&#xFF0C;-1&#x8868;&#x793A;&#x8D1F;&#x6837;&#x672C;&#x5728;&#x6B63;&#x6837;&#x672C;&#x4E2A;&#x6570;&#x4E0D;&#x8DB3;&#x7684;&#x60C5;&#x51B5;&#x4E0B;&#x7528;&#x4E8E;&#x8865;&#x9F50;256
                add_gt_as_proposals=False),# &#x662F;&#x5426;&#x52A0;&#x5165;gt&#x4F5C;&#x4E3A;proposal
                #&#x662F;&#x9632;&#x6B62;&#x9AD8;&#x8D28;&#x91CF;&#x6B63;&#x6837;&#x672C;&#x592A;&#x5C11;&#x800C;&#x52A0;&#x5165;&#x7684;&#xFF0C;&#x53EF;&#x4EE5;&#x4FDD;&#x8BC1;&#x524D;&#x671F;&#x6536;&#x655B;&#x66F4;&#x5FEB;&#x3001;&#x66F4;&#x7A33;&#x5B9A;&#xFF0C;&#x5C5E;&#x4E8E;&#x8BAD;&#x7EC3;&#x6280;&#x5DE7;&#xFF0C;&#x5728; RPN &#x6A21;&#x5757;&#x8BBE;&#x7F6E;&#x4E3A; False&#xFF0C;&#x4E3B;&#x8981;&#x7528;&#x4E8E; R-CNN&#xFF08;RCNN&#x6A21;&#x5757;&#x8BBE;&#x7F6E;&#x4E3A;True)&#xFF0C;&#x56E0;&#x4E3A;&#x524D;&#x671F; RPN &#x63D0;&#x4F9B;&#x7684;&#x6B63;&#x6837;&#x672C;&#x4E0D;&#x591F;&#xFF0C;&#x53EF;&#x80FD;&#x4F1A;&#x5BFC;&#x81F4;&#x8BAD;&#x7EC3;&#x4E0D;&#x7A33;&#x5B9A;&#x6216;&#x8005;&#x524D;&#x671F;&#x6536;&#x655B;&#x6162;&#x7684;&#x95EE;&#x9898;&#x3002;
            allowed_border=0,# &#x5141;&#x8BB8;&#x5728;bbox&#x5468;&#x56F4;&#x5916;&#x6269;&#x4E00;&#x5B9A;&#x7684;&#x50CF;&#x7D20;
            pos_weight=-1, # &#x6B63;&#x6837;&#x672C;&#x6743;&#x91CD;&#xFF0C;-1&#x8868;&#x793A;&#x4E0D;&#x6539;&#x53D8;&#x539F;&#x59CB;&#x7684;&#x6743;&#x91CD;
            debug=False), # debug&#x6A21;&#x5F0F;
        rpn_proposal=dict(#rpn&#x8F93;&#x5165;&#x7ED9;rcnn&#x7684;anchor
            nms_pre=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=[
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.5,#&#x8FD9;&#x91CC;&#x90FD;&#x662F;0.5&#x7684;&#x539F;&#x56E0;&#x662F;&#x4E00;&#x822C;rcnn&#x7684;anchor&#x662F;&#x7531;rpn&#x8F93;&#x5165;&#x7684;&#x9AD8;&#x8D28;&#x91CF;&#x7684;anchor&#x3002;
                    neg_iou_thr=0.5,
                    min_pos_iou=0.5,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                pos_weight=-1,
                debug=False),
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.6,
                    neg_iou_thr=0.6,
                    min_pos_iou=0.6,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                pos_weight=-1,
                debug=False),
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.7,
                    neg_iou_thr=0.7,
                    min_pos_iou=0.7,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                pos_weight=-1,
                debug=False)
        ]),
    test_cfg=dict(
        rpn=dict(
            nms_pre=1000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            score_thr=0.05,
            nms=dict(type='nms', iou_threshold=0.5),
            max_per_img=100)))
&#x516C;&#x5171;&#x903B;&#x8F91;&#x90E8;&#x5206;&#x8F93;&#x51FA; batch * nms_post &#x4E2A;&#x5019;&#x9009;&#x6846;&#x7684;&#x5206;&#x7C7B;&#x548C;&#x56DE;&#x5F52;&#x9884;&#x6D4B;&#x7ED3;&#x679C;
&#x5C06;&#x6240;&#x6709;&#x9884;&#x6D4B;&#x7ED3;&#x679C;&#x6309;&#x7167; batch &#x7EF4;&#x5EA6;&#x8FDB;&#x884C;&#x5207;&#x5206;&#xFF0C;&#x7136;&#x540E;&#x4F9D;&#x636E;&#x5355;&#x5F20;&#x56FE;&#x7247;&#x8FDB;&#x884C;&#x540E;&#x5904;&#x7406;&#xFF0C;&#x540E;&#x5904;&#x7406;&#x903B;&#x8F91;&#x4E3A;&#xFF1A;&#x5148;&#x89E3;&#x7801;&#x5E76;&#x8FD8;&#x539F;&#x4E3A;&#x539F;&#x56FE;&#x5C3A;&#x5EA6;&#xFF1B;&#x7136;&#x540E;&#x5229;&#x7528; score_thr &#x53BB;&#x9664;&#x4F4E;&#x5206;&#x503C;&#x9884;&#x6D4B;&#xFF1B;&#x7136;&#x540E;&#x8FDB;&#x884C; NMS&#xFF1B;&#x6700;&#x540E;&#x4FDD;&#x7559;&#x6700;&#x591A; max_per_img &#x4E2A;&#x7ED3;&#x679C;
dataset settings
dataset_type = 'UnderwaterDataset'
data_root = '/kaggle/working/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'anotation_train.json',
        img_prefix=data_root + 'image/',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'annotation_valid.json',
        img_prefix=data_root + 'image/',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=data_root + 'annotation_valid.json',
        img_prefix=data_root + 'image/',
        pipeline=test_pipeline))
evaluation = dict(interval=1, metric='bbox')# &#x6BCF;&#x9694;&#x591A;&#x5C11;epoch&#x9A8C;&#x8BC1;&#x4E00;&#x6B21;&#xFF0C;&#x4EE5;&#x53CA;&#x9A8C;&#x8BC1;&#x7684;&#x8BC4;&#x4EF7;&#x6307;&#x6807;

optimizer
optimizer = dict(type='SGD', lr=0.02/8, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[8, 11])
#runner = dict(type='EpochBasedRunner', max_epochs=12)
total_epochs =12

checkpoint_config = dict(interval=1) # &#x6BCF;&#x9694;&#x591A;&#x5C11;epoch&#x4FDD;&#x5B58;&#x4E00;&#x6B21;&#x6A21;&#x578B;&#x6587;&#x4EF6;
yapf:disable
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook')
    ])
yapf:enable
custom_hooks = [dict(type='NumClassCheckHook')]

dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]

2、DeltaXYWHBBoxCoder

mmdet/core/bbox/coder/delat_xywh_bbox_coder.py
源码:

def bbox2delta(proposals, gt, means=(0., 0., 0., 0.), stds=(1., 1., 1., 1.)):
    """Compute deltas of proposals w.r.t. gt.

    We usually compute the deltas of x, y, w, h of proposals w.r.t ground
    truth bboxes to get regression target.

    This is the inverse function of :func:delta2bbox.

    Args:
        proposals (Tensor): Boxes to be transformed, shape (N, ..., 4)
        gt (Tensor): Gt bboxes to be used as base, shape (N, ..., 4)
        means (Sequence[float]): Denormalizing means for delta coordinates
        stds (Sequence[float]): Denormalizing standard deviation for delta
            coordinates

    Returns:
        Tensor: deltas with shape (N, 4), where columns represent dx, dy,
            dw, dh.

"""
    assert proposals.size() == gt.size()

    proposals = proposals.float()
    gt = gt.float()
    px = (proposals[..., 0] + proposals[..., 2]) * 0.5
    py = (proposals[..., 1] + proposals[..., 3]) * 0.5
    pw = proposals[..., 2] - proposals[..., 0]
    ph = proposals[..., 3] - proposals[..., 1]

    gx = (gt[..., 0] + gt[..., 2]) * 0.5
    gy = (gt[..., 1] + gt[..., 3]) * 0.5
    gw = gt[..., 2] - gt[..., 0]
    gh = gt[..., 3] - gt[..., 1]

    dx = (gx - px) / pw
    dy = (gy - py) / ph
    dw = torch.log(gw / pw)
    dh = torch.log(gh / ph)
    deltas = torch.stack([dx, dy, dw, dh], dim=-1)

    means = deltas.new_tensor(means).unsqueeze(0)
    stds = deltas.new_tensor(stds).unsqueeze(0)
    deltas = deltas.sub_(means).div_(stds)

    return deltas

@mmcv.jit(coderize=True)
def delta2bbox(rois,
               deltas,
               means=(0., 0., 0., 0.),
               stds=(1., 1., 1., 1.),
               max_shape=None,
               wh_ratio_clip=16 / 1000,
               clip_border=True,
               add_ctr_clamp=False,
               ctr_clamp=32):
    """Apply deltas to shift/scale base boxes.

    Typically the rois are anchor or proposed bounding boxes and the deltas are
    network outputs used to shift/scale those boxes.

    This is the inverse function of :func:bbox2delta.

    Args:
        rois (Tensor): Boxes to be transformed. Has shape (N, 4).

        deltas (Tensor): Encoded offsets relative to each roi.

            Has shape (N, num_classes * 4) or (N, 4). Note
            N = num_base_anchors * W * H, when rois is a grid of
            anchors. Offset encoding follows [1]_.

        means (Sequence[float]): Denormalizing means for delta coordinates.

            Default (0., 0., 0., 0.).

        stds (Sequence[float]): Denormalizing standard deviation for delta
            coordinates. Default (1., 1., 1., 1.).

        max_shape (tuple[int, int]): Maximum bounds for boxes, specifies
           (H, W). Default None.

        wh_ratio_clip (float): Maximum aspect ratio for boxes. Default
            16 / 1000.

        clip_border (bool, optional): Whether clip the objects outside the
            border of the image. Default True.

        add_ctr_clamp (bool): Whether to add center clamp. When set to True,
            the center of the prediction bounding box will be clamped to
            avoid being too far away from the center of the anchor.

            Only used by YOLOF. Default False.

        ctr_clamp (int): the maximum pixel shift to clamp. Only used by YOLOF.

            Default 32.

    Returns:
        Tensor: Boxes with shape (N, num_classes * 4) or (N, 4), where 4
           represent tl_x, tl_y, br_x, br_y.

    References:
        .. [1] https://arxiv.org/abs/1311.2524

    Example:
        >>> rois = torch.Tensor([[ 0.,  0.,  1.,  1.],
        >>>                      [ 0.,  0.,  1.,  1.],
        >>>                      [ 0.,  0.,  1.,  1.],
        >>>                      [ 5.,  5.,  5.,  5.]])
        >>> deltas = torch.Tensor([[  0.,   0.,   0.,   0.],
        >>>                        [  1.,   1.,   1.,   1.],
        >>>                        [  0.,   0.,   2.,  -1.],
        >>>                        [ 0.7, -1.9, -0.5,  0.3]])
        >>> delta2bbox(rois, deltas, max_shape=(32, 32, 3))
        tensor([[0.0000, 0.0000, 1.0000, 1.0000],
                [0.1409, 0.1409, 2.8591, 2.8591],
                [0.0000, 0.3161, 4.1945, 0.6839],
                [5.0000, 5.0000, 5.0000, 5.0000]])
"""
    num_bboxes, num_classes = deltas.size(0), deltas.size(1) // 4
    if num_bboxes == 0:
        return deltas

    deltas = deltas.reshape(-1, 4)

    means = deltas.new_tensor(means).view(1, -1)
    stds = deltas.new_tensor(stds).view(1, -1)
    denorm_deltas = deltas * stds + means

    dxy = denorm_deltas[:, :2]
    dwh = denorm_deltas[:, 2:]

    rois_ = rois.repeat(1, num_classes).reshape(-1, 4)
    pxy = ((rois_[:, :2] + rois_[:, 2:]) * 0.5)
    pwh = (rois_[:, 2:] - rois_[:, :2])

    dxy_wh = pwh * dxy

    max_ratio = np.abs(np.log(wh_ratio_clip))
    if add_ctr_clamp:
        dxy_wh = torch.clamp(dxy_wh, max=ctr_clamp, min=-ctr_clamp)
        dwh = torch.clamp(dwh, max=max_ratio)
    else:
        dwh = dwh.clamp(min=-max_ratio, max=max_ratio)

    gxy = pxy + dxy_wh
    gwh = pwh * dwh.exp()
    x1y1 = gxy - (gwh * 0.5)
    x2y2 = gxy + (gwh * 0.5)
    bboxes = torch.cat([x1y1, x2y2], dim=-1)
    if clip_border and max_shape is not None:
        bboxes[..., 0::2].clamp_(min=0, max=max_shape[1])
        bboxes[..., 1::2].clamp_(min=0, max=max_shape[0])
    bboxes = bboxes.reshape(num_bboxes, -1)
    return bboxes

3、roi_extractor

参考链接：目标检测比赛中的tricks（已更新更多代码解析）
MMDetection 中常用算法详解：Faster R-CNN
config代码：

bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32],
            gc_context=True),

源码： mmdet/models/roi_heads/roi_extractors/single_level_roi_extractor.py

    def forward(self, feats, rois, roi_scale_factor=None):
        """Forward function."""

        out_size = self.roi_layers[0].output_size
        print("self.roi_layers[0].output_size:",self.roi_layers[0].output_size)
        print("self.roi_layers:",self.roi_layers)
        num_levels = len(feats)
        expand_dims = (-1, self.out_channels * out_size[0] * out_size[1])
        if torch.onnx.is_in_onnx_export():

            roi_feats = rois[:, :1].clone().detach()
            roi_feats = roi_feats.expand(*expand_dims)
            roi_feats = roi_feats.reshape(-1, self.out_channels, *out_size)
            roi_feats = roi_feats * 0
        else:
            roi_feats = feats[0].new_zeros(
                rois.size(0), self.out_channels, *out_size)

        if torch.__version__ == 'parrots':
            roi_feats.requires_grad = True

        if num_levels == 1:
            if len(rois) == 0:
                return roi_feats
            return self.roi_layers[0](feats[0], rois)

        if self.gc_context:
            context = []
            for feat in feats:
                context.append(self.pool(feat))
        print("context[0].shape:",context[0].shape)
        print("context[1].shape:", context[1].shape)

        target_lvls = self.map_roi_levels(rois, num_levels)
        batch_size = feats[0].shape[0]

        if roi_scale_factor is not None:
            rois = self.roi_rescale(rois, roi_scale_factor)

        for i in range(num_levels):
            mask = target_lvls == i
            if torch.onnx.is_in_onnx_export():

                mask = mask.float().unsqueeze(-1)

                rois_i = rois.clone().detach()
                rois_i *= mask
                mask_exp = mask.expand(*expand_dims).reshape(roi_feats.shape)
                roi_feats_t = self.roi_layers[i](feats[i], rois_i)
                roi_feats_t *= mask_exp
                roi_feats += roi_feats_t
                continue
            inds = mask.nonzero(as_tuple=False).squeeze(1)
            if inds.numel() > 0:
                rois_ = rois[inds]
                roi_feats_t = self.roi_layers[i](feats[i], rois_)
                if self.gc_context:
                    for j in range(batch_size):
                        roi_feats_t[rois_[:, 0] == j] += context[i][j]
                roi_feats[inds] = roi_feats_t
            else:

                roi_feats += sum(
                    x.view(-1)[0]
                    for x in self.parameters()) * 0. + feats[i].sum() * 0.

        print("roi_feats:",roi_feats.shape)
        return roi_feats

可见roi head总流程就是：roi_extractor输入的是來自rpn生成的形状为（512，5）的512个proposal（如果batch_size为2的话就是（1024，5））和fpn生成的四个尺度的feature map（维度是（batch_size,256,w,h)），将512个proposal按照算法对应到不同的feature map上，然后通过roi pooling生成（512 _batch_size,256,7,7)的张量，然后遍历所有尺度的feature map最后输出的是（512，256，7，7）的张量送入bbox head中。形状为（batch_size_512,256,7,7）bbox_feats输入到两层共享全连接层bbox head输出bbox_pred（形状为（512，4）和cls_score（形状为（512，num_classes)

cls_score, bbox_pred = bbox_head(bbox_feats)

!](https://img-blog.csdnimg.cn/e3ae649e751944ef88f5c1c9d57e3d41.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAcGhpbHkxMjM=,size_20,color_FFFFFF,t_70,g_se,x_16)

当batch_size为2时，

七、常用函数

1. mmcv.track_parallel_progress()

功能是多线程的对许多任务进行处理，比如多线程的读取数据集加快速度。

def track_parallel_progress(func,
                            tasks,
                            nproc,
                            initializer=None,
                            initargs=None,
                            bar_width=50,
                            chunksize=1,
                            skip_first=False,
                            keep_order=True,
                            file=sys.stdout):

参数func是应用于每个任务的函数，tasks是任务列表，nproc是worker线程数。

Original: https://blog.csdn.net/phily123/article/details/121963512
Author: phily123
Title: 目标检测学习笔记——mmdetection

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/680516/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

CHIP-2020 中文医学文本实体关系抽取

这个项目的本质是———–医学文本的实体关系联合抽取问题，既要解决命名实体识别，又要解决关系分类问题。使用RoFormerV2模型作为…

人工智能 2023年5月27日
0065
对比学习：MoCo代码详解

MoCo算法代码详解本文代码来源： 1.导入包 2.参数设置 3.数据预处理 4. 模型 * 4.1moment update key encoder 4.2进队出队 4.3sh…

人工智能 2023年7月20日
0066
Gtk调整widget部件大小size

原型 gtkmm void set_size_request(int width = -1, int height = -1); gtk voidgtk_widget_set_si…

人工智能 2023年6月4日
00108
100天精通Python（数据分析篇）——第61天：Pandas.to_datetime函数（处理时间）

抵扣说明： 1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。2.余额无法直接购买下载，可以购买VIP、C币套餐、付费专栏及课程。 Original: https:…

人工智能 2023年7月16日
0077
知识图谱在畜牧业中的应用产品化

在前篇中介绍了一个知识图谱用于畜牧行业中猪只饲养追溯的实例（https://blog.csdn.net/weixin_43990004/article/details/111942…

人工智能 2023年6月10日
0068
史上最详细的YOLOV3 SPP结构代码解析

推荐一个特别好的博主@太阳花的小绿豆，本人小白一枚，学习深度学习神经网络的知识全是看这个博主学习的。最近学习到了Yolov3 spp的代码，觉得代码有一定的难度在这里分享一下。我尽…

人工智能 2023年7月12日
0080
深度学习中的优化算法之AdaGrad

之前在https://blog.csdn.net/fengbingchun/article/details/123955067 介绍过SGD(Mini-Batch Gradient…

人工智能 2023年6月16日
00115
STC单片机21——EEPROM测试

STC89C51RC系列单片机内部EEPROM详细地址表：第一扇区第二扇区第三扇区第四扇区起始地址结束地址起始地址结束地址起始地址结束地址起始地址结束地址200…

人工智能 2023年6月27日
00130
知识图谱与数据库技术：RDF三元组库和Neo4j图数据库

随着知识图谱规模的日益增长，知识图谱数据管理问题愈加突出。近年来，知识图谱和数据库领域均认识到大规模知识图谱数据管理任务的紧迫性。由于传统关系数据库无法有效适应知识图谱的图数据模…

人工智能 2023年6月1日
0087
超分辨率——基于SRGAN的图像超分辨率重建(Pytorch实现)

基于SRGAN的图像超分辨率重建本文偏新手项，因此只是作为定性学习使用，因此不涉及最后的定量评估环节目录基于SRGAN的图像超分辨率重建 * 1 简要介绍 2 代码实现 &#…

人工智能 2023年5月26日
0076
【微信小程序】使用uni-app——开发首页搜索框导航栏（可同时兼容APP、H5、小程序）

目录前言 App、H5效果小程序效果一、兼容APP、H5的方式二、兼容小程序三、实现同时兼容前言首页都会提供一个搜索框给到客户，让客户自己去搜索自己想要的内容，这里就…

人工智能 2023年5月31日
00113
（门控卷积实现）DeepFillv2（图像修复）：Free-Form Image Inpainting with Gated Convolution，pytroch代码实现

deepfillv2的动机结合了几乎所有的目前先进的图像修复技术，基于部分卷积提出了门控卷积，结合了CA中的注意力机制，根据 Adversarial Edge图像修复中的边缘…

人工智能 2023年7月13日
00137
python的几种非线性回归

前一阵子有人和我吐槽过matlab内置的几个线性数据拟合工具满足不了需求，今天正好看到了一个关于使用scipy进行非线性回归的工具使用方法，写下来备忘。 Theory Given …

人工智能 2023年6月17日
0074
【canny边缘检测】canny边缘检测原理及代码详解

文章目录前言 canny边缘检测算法主要流程一、高斯模糊二、图像梯度计算三、非极大值抑制四、双阈值边界跟踪前言本文通过介绍canny边缘检测原理与代码解析，希望能让大…

人工智能 2023年6月17日
0058
python–selenium：元素点击不到，你需要的点击方法都在这

selenium自带click方法，有的时候不好用，元素定位到了，但是就是点不上。好，把我知道的所有点击方法都汇总在这，收藏一下以后清空面对点击不到元素的问题。详细情况： se…

人工智能 2023年7月5日
0050
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.

nohup python train_rcnn.py & 进行目标检测模型的训练，绝大部分情况下开始甚至好一段时间都OK，可是，，，，在培训过程中，有时培训开始，有时需要…

人工智能 2023年5月25日
0077

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31