yolo-V4-预测框解码部分

2023年7月12日上午3:33 • 人工智能 • 阅读 37

这部分主要关于对网络输出的三个不同大小的特征层（13，13）、（26，26）、（52，52）的预测框利用先验框anchors进行解码的过程（也就是利用先验框来对模型输出的预测框进行微调）

首先给出对应不同特征层对应的anchos，由聚类算法单独得出，一般是固定的

对于网络的输出inputs如下，包括三部分，255代表3*（5+80），每个特征层三个先验框，每个先验框包括4个坐标（x，y，w，h）中心坐标与长宽，1个置信度，80个类别得分

由于设定的anchors是相对于原始输入图片（416，416）而言的，对于该三个特征层需要进行转换，转换倍数一般由原始输入图片与inputs进行确定


scaled_anchors = [(anchor_width / stride_w, anchor_height / stride_h) for anchor_width, anchor_height in self.anchors[self.anchors_mask[i]]]


prediction = input.view(batch_size, len(self.anchors_mask[i]),
                                    self.bbox_attrs, input_height, input_width).permute(0, 1, 3, 4, 2).contiguous()


grid_x = torch.linspace(0, input_width - 1, input_width).repeat(input_height, 1).repeat(
                batch_size * len(self.anchors_mask[i]), 1, 1).view(x.shape).type(FloatTensor)
grid_y = torch.linspace(0, input_height - 1, input_height).repeat(input_width, 1).t().repeat(
                batch_size * len(self.anchors_mask[i]), 1, 1).view(y.shape).type(FloatTensor)

总的代码如下：

class DecodeBox():
    def __init__(self, anchors, num_classes, input_shape, anchors_mask = [[6,7,8], [3,4,5], [0,1,2]]):
        super(DecodeBox, self).__init__()
        self.anchors        = anchors
        self.num_classes    = num_classes
        self.bbox_attrs     = 5 + num_classes
        self.input_shape    = input_shape

        self.anchors_mask   = anchors_mask

    def decode_box(self, inputs):
        outputs = []
        for i, input in enumerate(inputs):

            batch_size      = input.size(0)
            input_height    = input.size(2)
            input_width     = input.size(3)

            stride_h = self.input_shape[0] / input_height
            stride_w = self.input_shape[1] / input_width

            scaled_anchors = [(anchor_width / stride_w, anchor_height / stride_h) for anchor_width, anchor_height in self.anchors[self.anchors_mask[i]]]

            prediction = input.view(batch_size, len(self.anchors_mask[i]),
                                    self.bbox_attrs, input_height, input_width).permute(0, 1, 3, 4, 2).contiguous()

            x = torch.sigmoid(prediction[..., 0])
            y = torch.sigmoid(prediction[..., 1])

            w = prediction[..., 2]
            h = prediction[..., 3]

            conf        = torch.sigmoid(prediction[..., 4])

            pred_cls    = torch.sigmoid(prediction[..., 5:])

            FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
            LongTensor  = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor

            grid_x = torch.linspace(0, input_width - 1, input_width).repeat(input_height, 1).repeat(
                batch_size * len(self.anchors_mask[i]), 1, 1).view(x.shape).type(FloatTensor)
            grid_y = torch.linspace(0, input_height - 1, input_height).repeat(input_width, 1).t().repeat(
                batch_size * len(self.anchors_mask[i]), 1, 1).view(y.shape).type(FloatTensor)

            anchor_w = FloatTensor(scaled_anchors).index_select(1, LongTensor([0]))
            anchor_h = FloatTensor(scaled_anchors).index_select(1, LongTensor([1]))
            anchor_w = anchor_w.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(w.shape)
            anchor_h = anchor_h.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(h.shape)

            pred_boxes          = FloatTensor(prediction[..., :4].shape)
            pred_boxes[..., 0]  = x.data + grid_x
            pred_boxes[..., 1]  = y.data + grid_y
            pred_boxes[..., 2]  = torch.exp(w.data) * anchor_w
            pred_boxes[..., 3]  = torch.exp(h.data) * anchor_h

            _scale = torch.Tensor([input_width, input_height, input_width, input_height]).type(FloatTensor)
            output = torch.cat((pred_boxes.view(batch_size, -1, 4) / _scale,
                                conf.view(batch_size, -1, 1), pred_cls.view(batch_size, -1, self.num_classes)), -1)
            outputs.append(output.data)
        return outputs

此时outputs包含了三个特征层输出的先验框，第一个特征层的shape为（1，507，85），507=3 _13_13，代表先验框的数量，pred_boxes.view(batch_size, -1, 4)中的-1代表将（1，3，13，13，85）中的（3，13，13）进行整合

    def yolo_correct_boxes(self, box_xy, box_wh, input_shape, image_shape, letterbox_image):

        box_yx = box_xy[..., ::-1]
        box_hw = box_wh[..., ::-1]
        input_shape = np.array(input_shape)
        image_shape = np.array(image_shape)

        if letterbox_image:

            new_shape = np.round(image_shape * np.min(input_shape/image_shape))
            offset  = (input_shape - new_shape)/2./input_shape
            scale   = input_shape/new_shape

            box_yx  = (box_yx - offset) * scale
            box_hw *= scale

        box_mins    = box_yx - (box_hw / 2.)
        box_maxes   = box_yx + (box_hw / 2.)
        boxes  = np.concatenate([box_mins[..., 0:1], box_mins[..., 1:2], box_maxes[..., 0:1], box_maxes[..., 1:2]], axis=-1)
        boxes *= np.concatenate([image_shape, image_shape], axis=-1)
        return boxes

此部分代码是利用图像有效区域相对于图像左上角的偏移情况即offset来对先验框进行微调。

    def non_max_suppression(self, prediction, num_classes, input_shape, image_shape, letterbox_image, conf_thres=0.5, nms_thres=0.4):

        box_corner          = prediction.new(prediction.shape)
        box_corner[:, :, 0] = prediction[:, :, 0] - prediction[:, :, 2] / 2
        box_corner[:, :, 1] = prediction[:, :, 1] - prediction[:, :, 3] / 2
        box_corner[:, :, 2] = prediction[:, :, 0] + prediction[:, :, 2] / 2
        box_corner[:, :, 3] = prediction[:, :, 1] + prediction[:, :, 3] / 2
        prediction[:, :, :4] = box_corner[:, :, :4]

        output = [None for _ in range(len(prediction))]
        for i, image_pred in enumerate(prediction):

            class_conf, class_pred = torch.max(image_pred[:, 5:5 + num_classes], 1, keepdim=True)

            conf_mask = (image_pred[:, 4] * class_conf[:, 0] >= conf_thres).squeeze()

            image_pred = image_pred[conf_mask]
            class_conf = class_conf[conf_mask]
            class_pred = class_pred[conf_mask]
            if not image_pred.size(0):
                continue

            detections = torch.cat((image_pred[:, :5], class_conf.float(), class_pred.float()), 1)

            unique_labels = detections[:, -1].cpu().unique()

            if prediction.is_cuda:
                unique_labels = unique_labels.cuda()
                detections = detections.cuda()

            for c in unique_labels:

                detections_class = detections[detections[:, -1] == c]

                keep = nms(
                    detections_class[:, :4],
                    detections_class[:, 4] * detections_class[:, 5],
                    nms_thres
                )
                max_detections = detections_class[keep]

                output[i] = max_detections if output[i] is None else torch.cat((output[i], max_detections))

            if output[i] is not None:
                output[i]           = output[i].cpu().numpy()
                box_xy, box_wh      = (output[i][:, 0:2] + output[i][:, 2:4])/2, output[i][:, 2:4] - output[i][:, 0:2]
                output[i][:, :4]    = self.yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape, letterbox_image)
        return output

此部分代码针对某一类得分的先验框进行NMS处理，得到重合度高的先验框，最后需要将先验框转换为（x，y，w，h）进行微调处理

Original: https://blog.csdn.net/m0_47709941/article/details/121190744
Author: 古月哥欠666
Title: yolo-V4-预测框解码部分

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/686852/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

目标检测算法之YOLOv3 spp

本文内容是个人学习笔记，备复习用，也欢迎各位在评论区指出个人理解漏洞或者没理解到的地方 SPP网络效果有个比较大的提升，其主要原因还是在下面几步改进中。其效果得到巨大提升，主要做了…

人工智能 2023年7月9日
0059
数字图像处理的数学变换（一）线性点运算、分段线性点运算、非线性点运算、阈值化运算

线性点运算线性点运算的应用 1）如果a>1，输出图像的对比度增大（灰度扩展） 2）如果0分段线性点运算将感兴趣的灰度范围线性扩展，相对抑制不感兴趣的灰 …

人工智能 2023年6月20日
0084
KNN算法介绍

KNN算法介绍文章目录 KNN算法介绍 * – + 一、介绍二、原理 * 2.1 K值取几个最近的邻居数据来判断分类 2.2 距离问题三.KNN特点一、介绍 K…

人工智能 2023年7月25日
0062
联邦学习（三）：Tensorflow2.0 实现联邦学习

抵扣说明： 1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。2.余额无法直接购买下载，可以购买VIP、C币套餐、付费专栏及课程。 Original: https:…

人工智能 2023年5月26日
0048
圣诞节快来了~用python做一个粒子烟花震撼众人赚个女孩回来吧~

前言嗨喽，大家好呀~这里是爱看美女的茜茜呐又到了学Python时刻~ ; 准备准备一下你运行效果的背景图以及一首你喜欢或那你女朋友喜欢的音乐效果 ; 代码展示导入模块 …

人工智能 2023年7月30日
0050
Java本地搭建宝塔部署实战springboot自动化立体智慧仓库WMS源码

啊哦~你想找的内容离你而去了哦内容不存在，可能为如下原因导致： ① 内容还在审核中 ② 内容以前存在，但是由于不符合新的规定而被删除 ③ 内容地址错误 ④ 作者删除了内容。可…

人工智能 2023年6月28日
0079
【语音通信】语音通信系统包括语音硬件采样，抽样量化，PCM编码解码模块，FIR滤波，QPSK调制解调模块，语音增强模块以及语音信号还原

1.软件版本 matlab2013b 2.系统设计概述通信通常是发送方和接收方之间的消息交换，而数字语音通信是指将语音(模拟)信号转换为数字电信号进行传输的过程，其基本结构与通信…

人工智能 2023年5月27日
0081
ModuleNotFoundError: No module named ‘transformers‘，已经安装了transformers库

明明已经安装了transformers库了，运行代码时却说找不到。先说明我的环境。系统：windowsIDE：pycharm框架：PyTorch包管理：Anaconda 我用的w…

人工智能 2023年6月23日
0086
YOLOv3训练数据集

这是我第一次尝试用yolo v3训练自己的数据集，以此整理一下来清晰思路，自此便可熟练训练深度学习模型。电脑配置系统：Windows10 显卡：GTX 1660ti（6G） C…

人工智能 2023年7月10日
0063
postman的断言、关联、参数化、使用newman生成测试报告

Potman 断言 Postman 断言简介让 Postman工具代替人工自动判断预期结果和实际结果是否一致断言代码书写在 Tests 标签页中。查看断言结果…

人工智能 2023年6月30日
0071
解决ValueError: No model found in config file.

解决ValueError: No model found in config file. 1 报错提示 2 问题出现的过程 3 问题原因分析 4 解决方式 * 4.1 解决方式一：…

人工智能 2023年6月15日
0063
生态系统类型空间分布数据/土地利用数据/植被类型数据/NPP数据/土壤侵蚀数据/土壤质地分类/降雨量栅格数据

啊哦~你想找的内容离你而去了哦内容不存在，可能为如下原因导致： ① 内容还在审核中 ② 内容以前存在，但是由于不符合新的规定而被删除 ③ 内容地址错误 ④ 作者删除了内容。可…

人工智能 2023年7月3日
0072
自动驾驶感知新范式——BEV感知经典论文总结和对比（下）

本文承接上篇：自动驾驶感知新范式——BEV感知经典论文总结和对比（上）_苹果姐的博客-CSDN博客bev感知经典论文总结和对比https://blog.csdn.net/weix…

人工智能 2023年6月17日
0067
SURF网格化特征点提取算法流程(一)

SURF网格化特征点提取算法流程（一）相关：SURF网格化特征点提取的算法流程(二)SURF网格化特征点提取的算法流程(三) SURF网格化算法主要包括下面三个阶段: 第一部分:…

人工智能 2023年6月20日
00100
Anaconda如何安装库（包）？

博主的专业是地理信息，并不是编程这个专业的，但是地理信息这个专业就很难受，想学好这个专业，像做一些科研方面的工作，就必须用到编程的知识。并且平时老师也会留一些作业要用到代码来完成，…

人工智能 2023年7月30日
00154
基于麻雀搜索算法优化的支持向量机回归预测-附代码

基于麻雀搜索算法优化的支持向量机预测及其MATLAB代码实现文章目录基于麻雀搜索算法优化的支持向量机预测及其MATLAB代码实现 1. 基于麻雀搜索算法优化的支持向量机预测简介…

人工智能 2023年6月17日
0085

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

yolo-V4-预测框解码部分

大家都在看