目标检测5–旷视YOLOX算法介绍

2023年6月17日上午9:42 • 人工智能 • 阅读 99

文章目录

*
– 1.简介
– 2.YoloX所做的主要工作
–
+ 2.1 分类/回归头解耦(Decoupled head)
+ 2.2 SimOTA
+ 2.3 End2End Yolo(NMS Free)
+ 2.4 其他
– 参考资料

欢迎访问个人网络日志🌹🌹知行空间🌹🌹

论文地址：http://arxiv.org/abs/2107.08430

代码：https://github.com/Megvii-BaseDetection/YOLOX

1.简介

2021年07月份，旷视的 Zheng Ze与 Sonttao Liu提交的论文中提出的 Anchor Free检测算法。主要工作聚焦在 a decoupled head和 label assigment strategy SimOTA。作者使用 YOLOX获得了2021年CVPR Autonomous Driving领域 Streaming Perception Challenge的第一名BaseDet。

之前 YOLO系列的论文自 YoloV1后都是 Anchor Based的，但自那之后如 CornerNet/CenterNet/FCOS等 Anchor Free的算法不断进步， YoloX的作者再次尝试将 Anchor Free的算法技巧应用到 Yolo算法上。作者认为 YoloV4/YoloV5属于优化过度的 Anchor Based算法，因此其提出的 YoloX算法主要与 YoloV3做比较。 YoloX中的使用的 baseline是 YoloV3-SPP, YoloV3-SPP中作者引入了 EMA权值更新， cosine lr schedule等策略。

2.YoloX所做的主要工作

2.1 分类/回归头解耦(Decoupled head)

在 R-CNN系列目标检测论文中， bounding box位置回归和物体类别判断都是分两个输出头来做的，而 Yolo系列论文使用的 Yolo Head都是在同个向量中通过共同的神经网络同时输出回归和类别信息。如下图1,在 YoloX中作者重新使用了 Decoupled Head，通过实验证明了其能提升检测的效果，

1)加快收敛速度

2)使用 NMS Free做 End2End Yolo时性能下降小。

; 2.2 SimOTA

Label Assignment时目标检测中的关键点，以往多是基于 Max IoU来实现 Anchor Box正负标签的判定，属于基于先验知识的静态匹配方式，因一张图像中目标的数量是有限的，因此会将大量的 Anchor判定为 negative,造成严重的类别不平衡问题。ATSS提出了自适应训练样本采样算法，自适应确定判断 Anchor正负标签的 IoU阈值。 Label Assignment问题是结合 Ground Truth Boxes给每个 Anchor Box分配类别，然后得到全局最优的结果，这个问题正好满足二分图优化的模型，因此也有不少工作是基于匈牙利算法，最优路经传输(Optimal transport assignment)来做的。OTA是 YoloX作者 2021年04月提交的论文中提出的方法，实现 label的动态匹配。

advanced label assignment的四个要点

1）loss/quality aware
2）center prior
3）dynamic number of positive anchors
4） global view

YoloX作者指出通过 Sinkhorn-Knopp算法求解 OTA问题增加了约 25%的训练时间，因此在 YoloX中使用被称为 SimOTA的 dynamic top-K算法求近似解。

SimOTA算法流程：

1）先对每个 prediction-gt pair计算代价 cost

c o s t i j = L i j c l s + λ L i j r e g cost_{ij}=L^{cls}{ij} + \lambda L^{reg}{ij}cos t ij =L ij c l s +λL ij re g

2）对每个g i g_i g i 选择在一个中心范围内cost最小的 top k个 prediction作为 positive sample,其余作为 negative samples,超参数 k对不同的 ground truth box取不同的值，其选择见作者在OTA论文中的介绍和这篇博客。

源码分析:

参考 mmdetection中 SimOTA的实现，见 mmdet/core/bbox/assigners/sim_ota_assigner.py

1.先调用get_in_gt_and_in_center_info方法

1.1prior center是否落在gt bboxes中?

1.2prior center是否落在gt box centers附近center_radius*stride_x/y矩形内?

2.初步判断priors哪些属于正样本is_in_gts_or_centers

3.bbox_overlaps将2的正样本与gt boxes分别计算IoU得iou_cost

4.binary_cross_entropy计算得cls_cost

5.cost_matrix:shape,num_posxgt_box_nums

6.dynamic_k_matching得到每个positive_prior对应的gt_box

匹配完成

class SimOTAAssigner(BaseAssigner):
    def __init__(self,):
     ...

    def _assign(self,
                pred_scores,
                priors,
                decoded_bboxes,
                gt_bboxes,
                gt_labels,
                gt_bboxes_ignore=None,
                eps=1e-7):
        """Assign gt to priors using SimOTA.

        Args:
            pred_scores (Tensor): Classification scores of one image,
                a 2D-Tensor with shape [num_priors, num_classes]
            priors (Tensor): All priors of one image, a 2D-Tensor with shape
                [num_priors, 4] in [cx, xy, stride_w, stride_y] format.

            decoded_bboxes (Tensor): Predicted bboxes, a 2D-Tensor with shape
                [num_priors, 4] in [tl_x, tl_y, br_x, br_y] format.

            gt_bboxes (Tensor): Ground truth bboxes of one image, a 2D-Tensor
                with shape [num_gts, 4] in [tl_x, tl_y, br_x, br_y] format.

            gt_labels (Tensor): Ground truth labels of one image, a Tensor
                with shape [num_gts].

            gt_bboxes_ignore (Tensor, optional): Ground truth bboxes that are
                labelled as ignored, e.g., crowd boxes in COCO.

            eps (float): A value added to the denominator for numerical
                stability. Default 1e-7.

        Returns:
            :obj:AssignResult: The assigned result.

"""
        INF = 100000.0
        num_gt = gt_bboxes.size(0)
        num_bboxes = decoded_bboxes.size(0)

        assigned_gt_inds = decoded_bboxes.new_full((num_bboxes, ),
                                                   0,
                                                   dtype=torch.long)
        valid_mask, is_in_boxes_and_center = self.get_in_gt_and_in_center_info(
            priors, gt_bboxes)
        valid_decoded_bbox = decoded_bboxes[valid_mask]
        valid_pred_scores = pred_scores[valid_mask]
        num_valid = valid_decoded_bbox.size(0)

        if num_gt == 0 or num_bboxes == 0 or num_valid == 0:

            max_overlaps = decoded_bboxes.new_zeros((num_bboxes, ))
            if num_gt == 0:

                assigned_gt_inds[:] = 0
            if gt_labels is None:
                assigned_labels = None
            else:
                assigned_labels = decoded_bboxes.new_full((num_bboxes, ),
                                                          -1,
                                                          dtype=torch.long)
            return AssignResult(
                num_gt, assigned_gt_inds, max_overlaps, labels=assigned_labels)

        pairwise_ious = bbox_overlaps(valid_decoded_bbox, gt_bboxes)
        iou_cost = -torch.log(pairwise_ious + eps)

        gt_onehot_label = (
            F.one_hot(gt_labels.to(torch.int64),
                      pred_scores.shape[-1]).float().unsqueeze(0).repeat(
                          num_valid, 1, 1))

        valid_pred_scores = valid_pred_scores.unsqueeze(1).repeat(1, num_gt, 1)
        cls_cost = (
            F.binary_cross_entropy(
                valid_pred_scores.to(dtype=torch.float32).sqrt_(),
                gt_onehot_label,
                reduction='none',
            ).sum(-1).to(dtype=valid_pred_scores.dtype))

        cost_matrix = (
            cls_cost * self.cls_weight + iou_cost * self.iou_weight +
            (~is_in_boxes_and_center) * INF)

        matched_pred_ious, matched_gt_inds = \
            self.dynamic_k_matching(
                cost_matrix, pairwise_ious, num_gt, valid_mask)

        assigned_gt_inds[valid_mask] = matched_gt_inds + 1
        assigned_labels = assigned_gt_inds.new_full((num_bboxes, ), -1)
        assigned_labels[valid_mask] = gt_labels[matched_gt_inds].long()
        max_overlaps = assigned_gt_inds.new_full((num_bboxes, ),
                                                 -INF,
                                                 dtype=torch.float32)
        max_overlaps[valid_mask] = matched_pred_ious
        return AssignResult(
            num_gt, assigned_gt_inds, max_overlaps, labels=assigned_labels)

    def get_in_gt_and_in_center_info(self, priors, gt_bboxes):
        num_gt = gt_bboxes.size(0)

        repeated_x = priors[:, 0].unsqueeze(1).repeat(1, num_gt)
        repeated_y = priors[:, 1].unsqueeze(1).repeat(1, num_gt)
        repeated_stride_x = priors[:, 2].unsqueeze(1).repeat(1, num_gt)
        repeated_stride_y = priors[:, 3].unsqueeze(1).repeat(1, num_gt)

        l_ = repeated_x - gt_bboxes[:, 0]
        t_ = repeated_y - gt_bboxes[:, 1]
        r_ = gt_bboxes[:, 2] - repeated_x
        b_ = gt_bboxes[:, 3] - repeated_y

        deltas = torch.stack([l_, t_, r_, b_], dim=1)
        is_in_gts = deltas.min(dim=1).values > 0
        is_in_gts_all = is_in_gts.sum(dim=1) > 0

        gt_cxs = (gt_bboxes[:, 0] + gt_bboxes[:, 2]) / 2.0
        gt_cys = (gt_bboxes[:, 1] + gt_bboxes[:, 3]) / 2.0
        ct_box_l = gt_cxs - self.center_radius * repeated_stride_x
        ct_box_t = gt_cys - self.center_radius * repeated_stride_y
        ct_box_r = gt_cxs + self.center_radius * repeated_stride_x
        ct_box_b = gt_cys + self.center_radius * repeated_stride_y

        cl_ = repeated_x - ct_box_l
        ct_ = repeated_y - ct_box_t
        cr_ = ct_box_r - repeated_x
        cb_ = ct_box_b - repeated_y

        ct_deltas = torch.stack([cl_, ct_, cr_, cb_], dim=1)
        is_in_cts = ct_deltas.min(dim=1).values > 0
        is_in_cts_all = is_in_cts.sum(dim=1) > 0

        is_in_gts_or_centers = is_in_gts_all | is_in_cts_all

        is_in_boxes_and_centers = (
            is_in_gts[is_in_gts_or_centers, :]
            & is_in_cts[is_in_gts_or_centers, :])
        return is_in_gts_or_centers, is_in_boxes_and_centers

    def dynamic_k_matching(self, cost, pairwise_ious, num_gt, valid_mask):
        matching_matrix = torch.zeros_like(cost, dtype=torch.uint8)

        candidate_topk = min(self.candidate_topk, pairwise_ious.size(0))

        topk_ious, _ = torch.topk(pairwise_ious, candidate_topk, dim=0)

        dynamic_ks = torch.clamp(topk_ious.sum(0).int(), min=1)
        for gt_idx in range(num_gt):
            _, pos_idx = torch.topk(
                cost[:, gt_idx], k=dynamic_ks[gt_idx], largest=False)
            matching_matrix[:, gt_idx][pos_idx] = 1

"""
        以上生成的matching_matrix有可能1个prior对应多个gt_boxes,还需以下处理
        从1个prior对应的多个gt_boxes选cost最小的那个作为gt_box
"""
        del topk_ious, dynamic_ks, pos_idx

        prior_match_gt_mask = matching_matrix.sum(1) > 1
        if prior_match_gt_mask.sum() > 0:
            cost_min, cost_argmin = torch.min(
                cost[prior_match_gt_mask, :], dim=1)
            matching_matrix[prior_match_gt_mask, :] *= 0
            matching_matrix[prior_match_gt_mask, cost_argmin] = 1

        fg_mask_inboxes = matching_matrix.sum(1) > 0
        valid_mask[valid_mask.clone()] = fg_mask_inboxes

        matched_gt_inds = matching_matrix[fg_mask_inboxes, :].argmax(1)
        matched_pred_ious = (matching_matrix *
                             pairwise_ious).sum(1)[fg_mask_inboxes]
        return matched_pred_ious, matched_gt_inds

2.3 End2End Yolo(NMS Free)

在 2020年12月份，旷视的 Jianfeng Wang等提交的论文End-to-End Object Detection with Fully Convolutional Network提出了不需要使用极大值抑制做后处理的检测方法，其改进了 label assigment,将以往 one2many的标签分配方式改成了 one2one的方式，更多介绍可参考论文作者知乎上的文章。 2021年01月份阿里的 Qiang Zhou等提交的论文 [Object Detection Made Simpler by Eliminating Heuristic NMS](https://arxiv.org/abs/2101.11782)中也使用 one2one的标签分配方式实现了 NMS Free的目标检测算法， YoloX参考以上方法实现 End2End训练得到如下表灰色的实验结果，可以看到对性能和准确率都有一定的影响:

最近聚焦在研究 label assignment，因此这块 NMS Free还没来得及看源码，先挖个坑。 NMS Free其实也非常有意义，因为 NMS存在，在边缘设备部署模型时，后处理部分不能像模型本身能放 NPU上加速计算，故需在CPU上运算。

; 2.4 其他

数据增强方法
Mosaic: YoloV5作者在ultralytics-YOLOv3中提出的
MixUp2018年 mixup: Beyond empirical risk minimization论文中提出，最早用于图像分类，自BoF后用于目标检测的数据增强

Anchor Free: 做法类似于FCOS;,每个 feature map cell只预测一个输出框，相当于 Anchor Number=1。同时为了得到更多的正样本，将 Feature Map上 cell落在目标中心附近一定范围的点都当作正样本，即 Multiple Positive Sample。

欢迎访问个人网络日志🌹🌹知行空间🌹🌹

参考资料

Original: https://blog.csdn.net/lx_ros/article/details/127273237
Author: 恒友成
Title: 目标检测5–旷视YOLOX算法介绍

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/629576/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

python导入、导出csv文件的方法

python 导入csv文件的方法 pd.read_csv（）几个常用参数： filepath_or_buffer :文件路径 header:指定哪一行为列名 index_col:…

人工智能 2023年7月6日
00147
prompt综述论文阅读：Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural La

Exploiting cloze questions for few shot text classification and natural language inference…

人工智能 2023年5月31日
00113
ERROR: You appear to be running an X server； please exit X before installing

在linux版本下安装gpu版tensorflow，安装cuda_10.1.105_418.39_linux.run时，报错如下： ERROR: You appear to be …

人工智能 2023年5月25日
0069
BERT源码解读，详细写记录从零实现BERT模型

目录 1、参数设置 2、数据预处理 3、预训练任务的数据构建部分（非常重要） mask部分（***）为什么在max_pred – 实际mask掉的单词数量上补0 4、…

人工智能 2023年7月21日
0080
被“智能”蒙蔽双眼的智能制造

最近调研了一个离散制造企业，企业现在有ERP、PLM、MES、WMS、BPM、MDM、ESB、QMS和门户等系统，企业要进行全厂的整体升级。针对这些系统使用的现状进行梳理，一共总…

人工智能 2023年6月4日
0074
NanoDet代码逐行精读与修改（一）Backbone

–neozng1@hnu.edu.cn 笔者已经为nanodet增加了非常详细的注释，代码请戳此仓库：nanodet_detail_notes: detail ever…

人工智能 2023年7月10日
0069
mysql中通过sql语句查询指定数据表的字段信息

  mysql数据库在安&…

人工智能 2023年6月6日
0090
JSP+Servlet+MySql超市管理系统项目源码

一、开发背景软件名称：超市管理系统(servlet+jsp) 使用对象：学习或了解过 java 基础课程，开始接触 javaWeb 的学生和软件爱好者源码链接:超市管理系统:…

人工智能 2023年7月29日
0054
【Python爬虫】爬取2022软科全国大学排行榜

目录 1.任务要求 2.网络爬虫实现原理 3.系统设计与代码实现 3.1 第一题 3.1.1 最初设计方案：采用request+BeautifulSoup 方式 3.1.2 更改后…

人工智能 2023年7月14日
00149
小程序【云开发】模式基本介绍 | 云开发项目初始化

文章目录 * – 云开发核心技术简介 – 云开发项目目录结构 – 云开发控制台的功能 – 云开发的环境和配额 – 云开发…

人工智能 2023年5月31日
0065
泰迪杯特等奖思路（教育平台线上课程用户行为分析（含数据可视化处理））-代码篇

此数据集与题目来自于2020年泰迪杯个人技能赛，为某线上平台真实数据。该作品已获得同年最好成绩，为特等奖并获泰迪杯，现在目前的基础之上对其进行进一步的复盘与优化，如果大家有更好的想…

人工智能 2023年7月17日
0067
肘方法和silhouette 系数定聚类个数

无监督学习中存在一个问题，就是我们并不知道问题的确切答案。由于没有数据集样本类标的确切数据，我们无法在无监督学习中使用评估监督学习模型性能的相关技术。因此，为了对聚类效果进行定量…

人工智能 2023年5月31日
0080
数据分析实验 sklearn 逻辑回归

提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录实验相关内容 * – 1.数据分析实验 2.数据集介绍 3.实验目标 4.整个流程实验前讲…

人工智能 2023年7月16日
0069
numpy.linalg.lstsq()详解以及用法示例

详解将最小二乘解返回到线性矩阵方程。计算近似求解方程的向量x。该方程可能未确定、良好或过度确定（即，线性独立行数可以小于、等于或大于其线性独立列数）。如果a是平方且为全秩，则x…

人工智能 2023年7月6日
0059
switch&循环语句

1. switch语句 1.1 分支语句switch语句格式 switch (表达式) { case 1: 语句体1; break; case 2: 语句体2; break; ….

人工智能 2023年6月27日
0053
JAVA互联网一线大厂面试真题自测，顺便看看大牛的通行证

前言熟练的掌握Java的核心底层技能很重要，这样才能从容面对面试官的种种考验，小编整理的这份面试清单都是各大厂的面试真题总结得出来的，感兴趣的挑战自己的技术层级的就赶紧来试试吧，…

人工智能 2023年6月27日
0081

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31