YOLO V7源码解析

1.命令行参数介绍

YOLO v7参数与YOLO v5差不多,我就直接将YOLO v5命令行参数搬过来了,偷个懒

--weights:初始权重
    --cfg:模型配置文件
    --data:数据配置文件
    --hyp:学习率等超参数文件
    --epochs:迭代次数
    -imgsz:图像大小
    --rect:长方形训练策略,不resize成正方形,使用灰条进行图片填充,防止图片失真
    --resume:恢复最近的培训,从last.pt开始
    --nosave:只保存最后的检查点
    --noval:仅在最后一次epochs进行验证
    --noautoanchor:禁用AutoAnchor
    --noplots:不保存打印文件
    --evolve:为x个epochs进化超参数
    --bucket:上传操作,这个参数是 yolov5 作者将一些东西放在谷歌云盘,可以进行下载
    --cache:在ram或硬盘中缓存数据
    --image-weights:测试过程中,图像的那些测试地方不太好,对这些不太好的地方加权重
    --single-cls:单类别标签置0
    --device:gpu设置
    --multi-scale:改变img大小+/-50%,能够被32整除
    --optimizer:学习率优化器
    --sync-bn:使用SyncBatchNorm,仅在DDP模式中支持,跨gpu时使用
    --workers:最大 dataloader 的线程数 (per RANK in DDP mode)
    --project:保存文件的地址
    --name:保存日志文件的名称
    --exist-ok:对项目名字是否进行覆盖
    --quad:在dataloader时采用什么样的方式读取我们的数据,1280的大图像可以指定
    --cos-lr:余弦学习率调度
    --label-smoothing:
    --patience:经过多少个epoch损失不再下降,就停止迭代
    --freeze:迁移学习,冻结训练
    --save-period:每x个周期保存一个检查点(如果<1,则禁用)
    --seed:随机种子
    --local_rank:gpu编号
    --entity:可视化访问信息
    --quad:
    四元数据加载器是我们认为的一个实验性功能,它可能允许在较低 --img 尺寸下进行更高 --img 尺寸训练的一些好处。
    此四元整理功能会将批次从 16x3x640x640 重塑为 4x3x1280x1280,这不会产生太大影响 本身,因为它只是重新排列批次中的马赛克,
    但有趣的是允许批次中的某些图像放大 2 倍(每个四边形中的 4 个马赛克中的一个放大 2 倍,其他 3 个马赛克被删除)

2.训练策略

ema:移动平均法,在更新参数的时候我们希望更平稳一些,考虑的步骤更多一些。公式为:

YOLO V7源码解析,当系数为YOLO V7源码解析时,相当于考虑YOLO V7源码解析步更新的梯度,YOLO v7默认YOLO V7源码解析

为0.999,相当于考虑1000步更新的梯度

3.网络结构

E-ELAN:

如下图所示,配置文件的部分就是E-ELAN模块,有四条路径,分别经过1个卷积、1个卷积、3个卷积和5个卷积,其中有3条路径共享了特征图。最后,对四条路径的特征图进行拼接。经过E-ELAN后,相较于输入,特征图通道数加倍。

YOLO V7源码解析

YOLO V7源码解析

YOLO V7源码解析

MPCONV:

对于MPconv,也分为两条路径,一条经过Maxpooling层,另一条路径经过卷积进行下采样,可以理解为综合考虑卷积和池化的下采样结果,以提升性能。最后,对特征图进行拼接。

YOLO V7源码解析

YOLO V7源码解析

SPPCSPC:

YOLO v7还是采用了SPP的思想,首先对特征图经过3次卷积,然后分别经过55,99,1313的池化,需要注意的是,将55与99最大池化的特征图进行ADD操作,与1313和原特征图进行拼接,经过不同kenel_size的池化,实现了对不同感受野的特征融合。然后再经过2次卷积,与经过一次卷积的特征图进行拼接。

YOLO V7源码解析

代码如下:

class SPPCSPC(nn.Module):
    # CSP https://github.com/WongKinYiu/CrossStagePartialNetworks
    def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5, k=(5, 9, 13)):
        super(SPPCSPC, self).__init__()
        c_ = int(2 * c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c1, c_, 1, 1)
        self.cv3 = Conv(c_, c_, 3, 1)
        self.cv4 = Conv(c_, c_, 1, 1)
        self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])
        self.cv5 = Conv(4 * c_, c_, 1, 1)
        self.cv6 = Conv(c_, c_, 3, 1)
        self.cv7 = Conv(2 * c_, c2, 1, 1)

    def forward(self, x):
        x1 = self.cv4(self.cv3(self.cv1(x)))
        y1 = self.cv6(self.cv5(torch.cat([x1] + [m(x1) for m in self.m], 1)))
        y2 = self.cv2(x)
        return self.cv7(torch.cat((y1, y2), dim=1))

PAN:

yolo v7还是采用了YOLO v5的PAN,经过SPPCSPC层后的特征图不断进行上采样,并与低层信息进行融合,实现了低层信息和高层信息的特征融合,然后进行下采样,与低层进行特征融合,实现了高层信息与低层信息的特征融合。

YOLO V7源码解析

HEAD:

head部分首先经过一层repconv,由3条路径组成,1层11的卷积、1层33的卷积和1层BN层。 经过repconv后,经过输出层输出结果

YOLO V7源码解析

代码如下:

class RepConv(nn.Module):
    # Represented convolution
    # https://arxiv.org/abs/2101.03697

    def __init__(self, c1, c2, k=3, s=1, p=None, g=1, act=True, deploy=False):
        super(RepConv, self).__init__()

        self.deploy = deploy
        self.groups = g
        self.in_channels = c1
        self.out_channels = c2

        assert k == 3
        assert autopad(k, p) == 1

        padding_11 = autopad(k, p) - k // 2

        self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())

        if deploy:
            self.rbr_reparam = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=True)

        else:
            self.rbr_identity = (nn.BatchNorm2d(num_features=c1) if c2 == c1 and s == 1 else None)

            self.rbr_dense = nn.Sequential(
                nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False),
                nn.BatchNorm2d(num_features=c2),
            )

            self.rbr_1x1 = nn.Sequential(
                nn.Conv2d( c1, c2, 1, s, padding_11, groups=g, bias=False),
                nn.BatchNorm2d(num_features=c2),
            )

    def forward(self, inputs):
        if hasattr(self, "rbr_reparam"):
            return self.act(self.rbr_reparam(inputs))

        if self.rbr_identity is None:
            id_out = 0
        else:
            id_out = self.rbr_identity(inputs)

        return self.act(self.rbr_dense(inputs) + self.rbr_1x1(inputs) + id_out)

    def get_equivalent_kernel_bias(self):
        kernel3x3, bias3x3 = self._fuse_bn_tensor(self.rbr_dense)
        kernel1x1, bias1x1 = self._fuse_bn_tensor(self.rbr_1x1)
        kernelid, biasid = self._fuse_bn_tensor(self.rbr_identity)
        return (
            kernel3x3 + self._pad_1x1_to_3x3_tensor(kernel1x1) + kernelid,
            bias3x3 + bias1x1 + biasid,
        )

    def _pad_1x1_to_3x3_tensor(self, kernel1x1):
        if kernel1x1 is None:
            return 0
        else:
            return nn.functional.pad(kernel1x1, [1, 1, 1, 1])

    def _fuse_bn_tensor(self, branch):
        if branch is None:
            return 0, 0
        if isinstance(branch, nn.Sequential):
            kernel = branch[0].weight
            running_mean = branch[1].running_mean
            running_var = branch[1].running_var
            gamma = branch[1].weight
            beta = branch[1].bias
            eps = branch[1].eps
        else:
            assert isinstance(branch, nn.BatchNorm2d)
            if not hasattr(self, "id_tensor"):
                input_dim = self.in_channels // self.groups
                kernel_value = np.zeros(
                    (self.in_channels, input_dim, 3, 3), dtype=np.float32
                )
                for i in range(self.in_channels):
                    kernel_value[i, i % input_dim, 1, 1] = 1
                self.id_tensor = torch.from_numpy(kernel_value).to(branch.weight.device)
            kernel = self.id_tensor
            running_mean = branch.running_mean
            running_var = branch.running_var
            gamma = branch.weight
            beta = branch.bias
            eps = branch.eps
        std = (running_var + eps).sqrt()
        t = (gamma / std).reshape(-1, 1, 1, 1)
        return kernel * t, beta - running_mean * gamma / std

    def repvgg_convert(self):
        kernel, bias = self.get_equivalent_kernel_bias()
        return (
            kernel.detach().cpu().numpy(),
            bias.detach().cpu().numpy(),
        )

    def fuse_conv_bn(self, conv, bn):

        std = (bn.running_var + bn.eps).sqrt()
        bias = bn.bias - bn.running_mean * bn.weight / std

        t = (bn.weight / std).reshape(-1, 1, 1, 1)
        weights = conv.weight * t

        bn = nn.Identity()
        conv = nn.Conv2d(in_channels = conv.in_channels,
                              out_channels = conv.out_channels,
                              kernel_size = conv.kernel_size,
                              stride=conv.stride,
                              padding = conv.padding,
                              dilation = conv.dilation,
                              groups = conv.groups,
                              bias = True,
                              padding_mode = conv.padding_mode)

        conv.weight = torch.nn.Parameter(weights)
        conv.bias = torch.nn.Parameter(bias)
        return conv

    def fuse_repvgg_block(self):
        if self.deploy:
            return
        print(f"RepConv.fuse_repvgg_block")

        self.rbr_dense = self.fuse_conv_bn(self.rbr_dense[0], self.rbr_dense[1])

        self.rbr_1x1 = self.fuse_conv_bn(self.rbr_1x1[0], self.rbr_1x1[1])
        rbr_1x1_bias = self.rbr_1x1.bias
        weight_1x1_expanded = torch.nn.functional.pad(self.rbr_1x1.weight, [1, 1, 1, 1])

        # Fuse self.rbr_identity
        if (isinstance(self.rbr_identity, nn.BatchNorm2d) or isinstance(self.rbr_identity, nn.modules.batchnorm.SyncBatchNorm)):
            # print(f"fuse: rbr_identity == BatchNorm2d or SyncBatchNorm")
            identity_conv_1x1 = nn.Conv2d(
                    in_channels=self.in_channels,
                    out_channels=self.out_channels,
                    kernel_size=1,
                    stride=1,
                    padding=0,
                    groups=self.groups,
                    bias=False)
            identity_conv_1x1.weight.data = identity_conv_1x1.weight.data.to(self.rbr_1x1.weight.data.device)
            identity_conv_1x1.weight.data = identity_conv_1x1.weight.data.squeeze().squeeze()
            # print(f" identity_conv_1x1.weight = {identity_conv_1x1.weight.shape}")
            identity_conv_1x1.weight.data.fill_(0.0)
            identity_conv_1x1.weight.data.fill_diagonal_(1.0)
            identity_conv_1x1.weight.data = identity_conv_1x1.weight.data.unsqueeze(2).unsqueeze(3)
            # print(f" identity_conv_1x1.weight = {identity_conv_1x1.weight.shape}")

            identity_conv_1x1 = self.fuse_conv_bn(identity_conv_1x1, self.rbr_identity)
            bias_identity_expanded = identity_conv_1x1.bias
            weight_identity_expanded = torch.nn.functional.pad(identity_conv_1x1.weight, [1, 1, 1, 1])
        else:
            # print(f"fuse: rbr_identity != BatchNorm2d, rbr_identity = {self.rbr_identity}")
            bias_identity_expanded = torch.nn.Parameter( torch.zeros_like(rbr_1x1_bias) )
            weight_identity_expanded = torch.nn.Parameter( torch.zeros_like(weight_1x1_expanded) )

        #print(f"self.rbr_1x1.weight = {self.rbr_1x1.weight.shape}, ")
        #print(f"weight_1x1_expanded = {weight_1x1_expanded.shape}, ")
        #print(f"self.rbr_dense.weight = {self.rbr_dense.weight.shape}, ")

        self.rbr_dense.weight = torch.nn.Parameter(self.rbr_dense.weight + weight_1x1_expanded + weight_identity_expanded)
        self.rbr_dense.bias = torch.nn.Parameter(self.rbr_dense.bias + rbr_1x1_bias + bias_identity_expanded)

        self.rbr_reparam = self.rbr_dense
        self.deploy = True

        if self.rbr_identity is not None:
            del self.rbr_identity
            self.rbr_identity = None

        if self.rbr_1x1 is not None:
            del self.rbr_1x1
            self.rbr_1x1 = None

        if self.rbr_dense is not None:
            del self.rbr_dense
            self.rbr_dense = None

输出层:

对于输出层,YOLO v7采用了yoloR的思想,首先对于经过repconv的特征图,经过ImplicitA层,我个人的理解是,ImplicitA相当于就是各个通道一个偏置项,以丰富各个通道所提取的信息,同时这个偏置项是可以学习的。经过ImplicitA后的特征馈送到输出层中,将输出的结果ImplicitM层,ImplicitM层我的理解是他对输出结果进行了放缩,能够更好的进行回归框的预测。

class ImplicitA(nn.Module):
    def __init__(self, channel, mean=0., std=.02):
        super(ImplicitA, self).__init__()
        self.channel = channel
        self.mean = mean
        self.std = std
        self.implicit = nn.Parameter(torch.zeros(1, channel, 1, 1))
        nn.init.normal_(self.implicit, mean=self.mean, std=self.std)

    def forward(self, x):
        return self.implicit + x

class ImplicitM(nn.Module):
    def __init__(self, channel, mean=1., std=.02):
        super(ImplicitM, self).__init__()
        self.channel = channel
        self.mean = mean
        self.std = std
        self.implicit = nn.Parameter(torch.ones(1, channel, 1, 1))
        nn.init.normal_(self.implicit, mean=self.mean, std=self.std)

    def forward(self, x):
        return self.implicit * x

4.损失函数

损失函数的输入有3个部分,第一个是预测值,第二个是标签,第三个是图像img

预测值即为上面模型的输出,而对于标签target, 维度为物体数量*6,对于第二个维度,第一列为标签的索引,第二列为物体所属类别,而后面四列为物体检测框。

YOLO V7源码解析

候选框的偏移

首先,对target进行变换,让target包含anchor的索引,此时target维度为

number of anchors,number of targets,7,最后一个维度7的组成为:batch索引,类别,x,y,w,h,anchor索引。

对于偏移,偏移的思想是应对正负样本不均衡的问题,通常来说,负样本远远多于正样本。因此,通过偏移,可以构建更多的正样本。对于样本的中心点,只要目标的中心点偏移后位于附近的网格,也认为这个目标属于这个网格,对应的锚框的区域为正样本

具体过程为:

*

输入target标签为相对坐标,将x,y,w,h映射到特征图上

*

计算锚框与真实框的宽高比,筛选出宽高比在1/r~r(默认为4),其他相差太大的锚框全部,去除,这样可以保证回归结果
  • 分别以左上角为原点和右下角为原点得到中心点的坐标
    *
对于每一个网格,选出中心点离网格中心点左、上小于0.5的target,并且不能是边界,例如:当中心点距离网格左边的距离小于0.5时,中心点左移0.5,此时,该网格左边的网格对应的锚框也为正样本。

通过,偏移,增加了正样本的数量,返回整数作为中心点所在网格的索引,最终得到存储batch_indices, anchor_indices, grid indices的索引矩阵和对应的anchors

代码如下:

 def find_3_positive(self, p, targets):
        # Build targets for compute_loss(), input targets(image,class,x,y,w,h)
        na, nt = self.na, targets.shape[0]  # number of anchors, targets
        indices, anch = [], []
        gain = torch.ones(7, device=targets.device).long()  # normalized to gridspace gain
        #---------------------------------------------------------------------------------#
        # number of anchors,number of targets
        # 表示哪个target属于哪个anchor
        #---------------------------------------------------------------------------------#
        ai = torch.arange(na, device=targets.device).float().view(na, 1).repeat(1, nt)  # same as .repeat_interleave(nt)
        # ---------------------------------------------------------------------------------#
        # number of anchors,number of targets,7
        # 最后一个维度7的组成为:batch索引,类别,x,y,w,h,anchor索引
        # ---------------------------------------------------------------------------------#
        targets = torch.cat((targets.repeat(na, 1, 1), ai[:, :, None]), 2)  # append anchor indices

        # ---------------------------------------------------------------------------------#
        # 对于样本的中心点,只要目标的中心点偏移后位于附近的网格,也认为这个目标属于这个网格,对应的锚框的区域为正样本
        # ---------------------------------------------------------------------------------#
        g = 0.5  # bias
        off = torch.tensor([[0, 0],
                            [1, 0], [0, 1], [-1, 0], [0, -1],  # j,k,l,m
                            # [1, 1], [1, -1], [-1, 1], [-1, -1],  # jk,jm,lk,lm
                            ], device=targets.device).float() * g  # offsets

        for i in range(self.nl):
            anchors = self.anchors[i]
            # 特征图的h,w
            gain[2:6] = torch.tensor(p[i].shape)[[3, 2, 3, 2]]  # xyxy gain

            # Match targets to anchors
            # target标签为相对坐标,将x,y,w,h映射到特征图上
            t = targets * gain
            if nt:
                # Matches
                # 计算锚框与真实框的宽高比,筛选出宽高比在1/r~r(默认为4),其他相差太大的锚框全部
                # 去除,这样可以保证回归结果
                r = t[:, :, 4:6] / anchors[:, None]  # wh ratio
                j = torch.max(r, 1. / r).max(2)[0] < self.hyp['anchor_t']  # compare
                # j = wh_iou(anchors, t[:, 4:6]) > model.hyp['iou_t']  # iou(3,n)=wh_iou(anchors(3,2), gwh(n,2))
                t = t[j]  # filter &#x53BB;&#x9664;&#x76F8;&#x5DEE;&#x592A;&#x5927;&#x7684;&#x951A;&#x6846;

                # Offsets &#x504F;&#x79FB;&#x64CD;&#x4F5C;
                # &#x5230;&#x5DE6;&#x4E0A;&#x89D2;&#x7684;&#x8DDD;&#x79BB;,&#x76F8;&#x5F53;&#x4E8E;&#x5C31;&#x662F;&#x4EE5;&#x5DE6;&#x4E0A;&#x89D2;&#x4E3A;&#x539F;&#x70B9;
                gxy = t[:, 2:4]  # grid xy
                # &#x5230;&#x53F3;&#x4E0B;&#x89D2;&#x7684;&#x8DDD;&#x79BB;,&#x76F8;&#x5F53;&#x4E8E;&#x5C31;&#x662F;&#x4EE5;&#x53F3;&#x4E0B;&#x89D2;&#x4E3A;&#x539F;&#x70B9;
                gxi = gain[[2, 3]] - gxy  # inverse
                # &#x5BF9;&#x4E8E;&#x6BCF;&#x4E00;&#x4E2A;&#x7F51;&#x683C;,&#x9009;&#x51FA;&#x4E2D;&#x5FC3;&#x70B9;&#x79BB;&#x7F51;&#x683C;&#x4E2D;&#x5FC3;&#x70B9;&#x5DE6;&#x4E0A;&#x89D2;&#x5C0F;&#x4E8E;0.5&#x7684;target,&#x5E76;&#x4E14;&#x4E0D;&#x80FD;&#x662F;&#x8FB9;&#x754C;
                # &#x4F8B;&#x5982;:&#x6709;&#x4E24;&#x4E2A;&#x771F;&#x5B9E;&#x6846;,&#x9700;&#x8981;&#x5224;&#x65AD;,&#x7B2C;&#x4E00;&#x4E2A;&#x7269;&#x4F53;&#x662F;&#x5426;&#x6EE1;&#x8DB3;&#x8981;&#x6C42;,&#x7B2C;&#x4E8C;&#x4E2A;&#x7269;&#x4F53;&#x662F;&#x5426;&#x6EE1;&#x8DB3;&#x8981;&#x6C42;,
                # &#x5C06;&#x6240;&#x5F97;&#x7684;&#x7684;&#x77E9;&#x9635;&#x8F6C;&#x7F6E;,j&#x4EE3;&#x8868;&#x5728;H&#x7EF4;&#x5EA6;&#x662F;&#x5426;&#x6EE1;&#x8DB3;&#x8981;&#x6C42;,k&#x4EE3;&#x8868;&#x5728;w&#x7EF4;&#x5EA6;&#x662F;&#x5426;&#x6EE1;&#x8DB3;&#x8981;&#x6C42;
                j, k = ((gxy % 1. < g) & (gxy > 1.)).T
                # &#x5BF9;&#x4E8E;&#x6BCF;&#x4E00;&#x4E2A;&#x7F51;&#x683C;,&#x9009;&#x51FA;&#x4E2D;&#x5FC3;&#x70B9;&#x79BB;&#x7F51;&#x683C;&#x4E2D;&#x5FC3;&#x70B9;&#x53F3;&#x4E0B;&#x89D2;&#x5C0F;&#x4E8E;0.5&#x7684;target,&#x5E76;&#x4E14;&#x4E0D;&#x80FD;&#x662F;&#x8FB9;&#x754C;
                l, m = ((gxi % 1. < g) & (gxi > 1.)).T
                # &#x5BF9;&#x5E94;&#x4E8E;&#x4E0A;&#x9762;&#x7684;&#x4E94;&#x4E2A;&#x504F;&#x79FB;
                j = torch.stack((torch.ones_like(j), j, k, l, m))
                # target&#x91CD;&#x590D;&#x4E94;&#x6B21;&#x5BF9;&#x5E94;&#x4E8E;&#x4E94;&#x4E2A;&#x504F;&#x79FB;,&#x7136;&#x540E;&#x5224;&#x65AD;&#x80FD;&#x5426;&#x8FDB;&#x884C;&#x504F;&#x79FB;,&#x6BD4;&#x5982;,&#x5230;&#x7F51;&#x683C;&#x5DE6;&#x8FB9;&#x7684;&#x8DDD;&#x79BB;&#x5C0F;&#x4E8E;0.5,&#x5C31;&#x5411;&#x5DE6;&#x8FDB;&#x884C;&#x504F;&#x79FB;,
                # &#x9700;&#x8981;&#x6CE8;&#x610F;&#x7684;&#x662F;,&#x4E0A;&#x9762;&#x7ED9;&#x7684;+1,&#x4F46;&#x662F;&#x4E0B;&#x9762;&#x64CD;&#x4F5C;&#x662F;&#x51CF;,&#x56E0;&#x6B64;&#x5230;&#x7F51;&#x683C;&#x5DE6;&#x8FB9;&#x7684;&#x8DDD;&#x79BB;&#x5C0F;&#x4E8E;0.5,&#x5C31;&#x5411;&#x5DE6;&#x8FDB;&#x884C;&#x504F;&#x79FB;
                t = t.repeat((5, 1, 1))[j]
                offsets = (torch.zeros_like(gxy)[None] + off[:, None])[j]
            else:
                t = targets[0]
                offsets = 0

            # Define batch_indice,class
            b, c = t[:, :2].long().T  # image, class
            gxy = t[:, 2:4]  # grid xy
            gwh = t[:, 4:6]  # grid wh
            # &#x6267;&#x884C;&#x504F;&#x79FB;&#x64CD;&#x4F5C;
            gij = (gxy - offsets).long()
            gi, gj = gij.T  # grid xy indices

            # Append
            a = t[:, 6].long()  # anchor indices
            indices.append((b, a, gj.clamp_(0, gain[3] - 1), gi.clamp_(0, gain[2] - 1)))  # image, anchor, grid indices
            anch.append(anchors[a])  # anchors

        return indices, anch

候选框的筛选分析:

yolo v7的候选框的筛选跟YOLO v5和YOLO X均不相同,对于上面候选框的偏移,我们最终得到了9个锚框,然后分别计算这9个锚框的类别损失和IOU损失之和。

YOLO V7源码解析

具体过程为: 首先对候选框进行初筛,取出初筛的索引及对应的预测结果,计算出对应的x,y和w,h。需要注意的是,计算过程与初赛过程相对应。对于x,y的计算,若x,y分别为相对于网格的距离,范围为0~1,经过偏移后,x,y的取值范围为-0.5~1.5。对w和h也进行限制,初筛的时候,我们将宽高比大于4的都去掉了,因此,预测的范围也为0-4。然后计算预测框和真实框的iou,取出iou损失中最小的topk,如果多于10个候选框,取10个,少于10个候选框,全部取。然后 对topk的iou进行累加操作,最后取[iou之和]个候选框,相当于iou很小的候选框统统不要,最小为1个,其目的去除iou很小的候选框。并计算出iou损失,和分类损失进行加权,根据加权的结果筛选出对应的候选框。如果出现多个真实框匹配到了同一候选框的情况,此时对应的anchor_matching_gt > 1。比较哪一个真实框跟这个候选框的损失最小,损失最小的真实框匹配上该候选框,其他地方置为0。最后对信息进行汇总,返回batch索引,anchor索引,网格索引,对应的真实标签信息和锚框,并按照输出层进行统计。

YOLO V7源码解析

代码如下:

 def build_targets(self, p, targets, imgs):

        #indices, anch = self.find_positive(p, targets)
        indices, anch = self.find_3_positive(p, targets)
        #indices, anch = self.find_4_positive(p, targets)
        #indices, anch = self.find_5_positive(p, targets)
        #indices, anch = self.find_9_positive(p, targets)

        matching_bs = [[] for pp in p]
        matching_as = [[] for pp in p]
        matching_gjs = [[] for pp in p]
        matching_gis = [[] for pp in p]
        matching_targets = [[] for pp in p]
        matching_anchs = [[] for pp in p]

        nl = len(p)

        for batch_idx in range(p[0].shape[0]):
            # &#x53D6;&#x51FA;batch&#x5BF9;&#x5E94;&#x7684;target
            b_idx = targets[:, 0]==batch_idx
            this_target = targets[b_idx]
            if this_target.shape[0] == 0:
                continue

            # &#x53D6;&#x51FA;&#x5BF9;&#x5E94;&#x7684;&#x771F;&#x5B9E;&#x6846;,&#x8F6C;&#x6362;&#x4E3A;x1,y1,x2,y2
            txywh = this_target[:, 2:6] * imgs[batch_idx].shape[1]
            txyxy = xywh2xyxy(txywh)

            pxyxys = []
            p_cls = []
            p_obj = []
            from_which_layer = []
            all_b = []
            all_a = []
            all_gj = []
            all_gi = []
            all_anch = []

            for i, pi in enumerate(p):

                # &#x6709;&#x65F6;&#x5019;index&#x4F1A;&#x4E3A;&#x7A7A;,&#x4EE3;&#x8868;&#x67D0;&#x4E00;&#x5C42;&#x6CA1;&#x6709;&#x5BF9;&#x5E94;&#x7684;&#x951A;&#x6846;
                # b&#x4EE3;&#x8868;batch_index,a&#x4EE3;&#x8868;anchor_indices,
                # gj,gi&#x4EE3;&#x8868;&#x7F51;&#x683C;&#x7D22;&#x5F15;
                b, a, gj, gi = indices[i]
                idx = (b == batch_idx)
                b, a, gj, gi = b[idx], a[idx], gj[idx], gi[idx]
                all_b.append(b)
                all_a.append(a)
                all_gj.append(gj)
                all_gi.append(gi)
                all_anch.append(anch[i][idx])
                # size&#x4E3A;&#x521D;&#x7B5B;&#x540E;&#x7684;&#x5019;&#x9009;&#x6846;&#x6570;&#x91CF;,&#x503C;&#x4E3A;&#x8F93;&#x51FA;&#x5C42;&#x7684;&#x7D22;&#x5F15;
                from_which_layer.append(torch.ones(size=(len(b),)) * i) # &#x6765;&#x81EA;&#x54EA;&#x4E2A;&#x8F93;&#x51FA;&#x5C42;

                fg_pred = pi[b, a, gj, gi]   # &#x5BF9;&#x5E94;&#x7F51;&#x683C;&#x7684;&#x9884;&#x6D4B;&#x7ED3;&#x679C;
                p_obj.append(fg_pred[:, 4:5])
                p_cls.append(fg_pred[:, 5:])

                grid = torch.stack([gi, gj], dim=1) # &#x7F51;&#x683C;&#x5750;&#x6807;
                # &#x4E0E;&#x5019;&#x9009;&#x6846;&#x7684;&#x504F;&#x79FB;&#x76F8;&#x5BF9;&#x5E94;,&#x82E5;x,y&#x5206;&#x522B;&#x4E3A;&#x76F8;&#x5BF9;&#x4E8E;&#x7F51;&#x683C;&#x7684;&#x8DDD;&#x79BB;,&#x8303;&#x56F4;&#x4E3A;0~1,&#x7ECF;&#x8FC7;&#x504F;&#x79FB;&#x540E;,x,y&#x7684;&#x53D6;&#x503C;&#x8303;&#x56F4;&#x4E3A;-0.5~1.5
                # &#x800C;&#x4E0B;&#x9762;&#x7684;&#x9884;&#x6D4B;&#x7ED3;&#x679C;&#x7ECF;&#x8FC7;sigmoid&#x540E;&#x7684;&#x8303;&#x56F4;&#x4E3A;0~1,&#x6700;&#x7EC8;&#x8BA1;&#x7B97;&#x7684;&#x8303;&#x56F4;&#x4E3A;-0.5~1.5
                pxy = (fg_pred[:, :2].sigmoid() * 2. - 0.5 + grid) * self.stride[i] #/ 8.

                #pxy = (fg_pred[:, :2].sigmoid() * 3. - 1. + grid) * self.stride[i]
                # &#x5BF9;w&#x548C;h&#x4E5F;&#x8FDB;&#x884C;&#x9650;&#x5236;,&#x521D;&#x7B5B;&#x7684;&#x65F6;&#x5019;,&#x6211;&#x4EEC;&#x5C06;&#x5BBD;&#x9AD8;&#x6BD4;&#x5927;&#x4E8E;4&#x7684;&#x90FD;&#x53BB;&#x6389;&#x4E86;,&#x56E0;&#x6B64;&#xFF0C;&#x9884;&#x6D4B;&#x7684;&#x8303;&#x56F4;&#x4E5F;&#x4E3A;0-4
                pwh = (fg_pred[:, 2:4].sigmoid() * 2) ** 2 * anch[i][idx] * self.stride[i] #/ 8.

                pxywh = torch.cat([pxy, pwh], dim=-1)
                pxyxy = xywh2xyxy(pxywh)    # &#x8F6C;&#x6362;&#x4E3A;x,y,w,h
                pxyxys.append(pxyxy)

            pxyxys = torch.cat(pxyxys, dim=0)
            if pxyxys.shape[0] == 0:
                continue
            # &#x5BF9;&#x4E09;&#x5C42;&#x7684;&#x7ED3;&#x679C;&#x8FDB;&#x884C;&#x6C47;&#x603B;,&#x53EF;&#x80FD;&#x6709;&#x4E9B;&#x5C42;&#x7ED3;&#x679C;&#x4E3A;&#x7A7A;
            p_obj = torch.cat(p_obj, dim=0)
            p_cls = torch.cat(p_cls, dim=0)
            from_which_layer = torch.cat(from_which_layer, dim=0)
            all_b = torch.cat(all_b, dim=0)
            all_a = torch.cat(all_a, dim=0)
            all_gj = torch.cat(all_gj, dim=0)
            all_gi = torch.cat(all_gi, dim=0)
            all_anch = torch.cat(all_anch, dim=0)

            # &#x8BA1;&#x7B97;ciou&#x635F;&#x5931;
            pair_wise_iou = box_iou(txyxy, pxyxys)

            pair_wise_iou_loss = -torch.log(pair_wise_iou + 1e-8)

            # &#x53D6;&#x51FA;iou&#x635F;&#x5931;&#x4E2D;&#x6700;&#x5C0F;&#x7684;topk&#xFF0C;&#x5982;&#x679C;&#x591A;&#x4E8E;10&#x4E2A;&#x5019;&#x9009;&#x6846;,&#x53D6;10&#x4E2A;,&#x5C11;&#x4E8E;10&#x4E2A;&#x5019;&#x9009;&#x6846;,&#x5168;&#x90E8;&#x53D6;
            top_k, _ = torch.topk(pair_wise_iou, min(10, pair_wise_iou.shape[1]), dim=1)
            # &#x5BF9;iou&#x8FDB;&#x884C;&#x7D2F;&#x52A0;&#x64CD;&#x4F5C;,&#x6700;&#x540E;&#x53D6;[iou&#x4E4B;&#x548C;]&#x4E2A;&#x5019;&#x9009;&#x6846;,&#x76F8;&#x5F53;&#x4E8E;iou&#x5F88;&#x5C0F;&#x7684;&#x5019;&#x9009;&#x6846;&#x7EDF;&#x7EDF;&#x4E0D;&#x8981;,&#x6700;&#x5C0F;&#x4E3A;1&#x4E2A;
            dynamic_ks = torch.clamp(top_k.sum(1).int(), min=1)

            # &#x5C06;&#x7C7B;&#x522B;&#x8F6C;&#x6362;&#x4E3A;one_hot&#x7F16;&#x7801;&#x683C;&#x5F0F;
            gt_cls_per_image = (
                F.one_hot(this_target[:, 1].to(torch.int64), self.nc)
                .float()
                .unsqueeze(1)
                .repeat(1, pxyxys.shape[0], 1)
            )

            # &#x771F;&#x5B9E;&#x6846;&#x76EE;&#x6807;&#x7684;&#x4E2A;&#x6570;
            num_gt = this_target.shape[0]
            # &#x9884;&#x6D4B;&#x6982;&#x7387;:p_obj*p_cls
            cls_preds_ = (
                p_cls.float().unsqueeze(0).repeat(num_gt, 1, 1).sigmoid_()
                * p_obj.unsqueeze(0).repeat(num_gt, 1, 1).sigmoid_()
            )

            y = cls_preds_.sqrt_()
            pair_wise_cls_loss = F.binary_cross_entropy_with_logits(
               torch.log(y/(1-y)) , gt_cls_per_image, reduction="none"
            ).sum(-1)
            del cls_preds_

            # &#x603B;&#x635F;&#x5931;:cls_loss+iou_loss
            cost = (
                pair_wise_cls_loss
                + 3.0 * pair_wise_iou_loss
            )

            matching_matrix = torch.zeros_like(cost)
            # &#x53D6;&#x51FA;&#x635F;&#x5931;&#x6700;&#x5C0F;&#x7684;&#x5019;&#x9009;&#x6846;&#x7684;&#x7D22;&#x5F15;
            for gt_idx in range(num_gt):
                _, pos_idx = torch.topk(
                    cost[gt_idx], k=dynamic_ks[gt_idx].item(), largest=False
                )
                matching_matrix[gt_idx][pos_idx] = 1.0

            del top_k, dynamic_ks
            # &#x7AD6;&#x7740;&#x52A0;
            anchor_matching_gt = matching_matrix.sum(0)
            # &#x5904;&#x7406;&#x591A;&#x4E2A;&#x771F;&#x5B9E;&#x6846;&#x5339;&#x914D;&#x5230;&#x4E86;&#x540C;&#x4E00;&#x5019;&#x9009;&#x6846;&#x7684;&#x60C5;&#x51B5;,&#x6B64;&#x65F6;&#x5BF9;&#x5E94;&#x7684;anchor_matching_gt > 1
            if (anchor_matching_gt > 1).sum() > 0:
                # &#x6BD4;&#x8F83;&#x54EA;&#x4E00;&#x4E2A;&#x771F;&#x5B9E;&#x6846;&#x8DDF;&#x8FD9;&#x4E2A;&#x5019;&#x9009;&#x6846;&#x7684;&#x635F;&#x5931;&#x6700;&#x5C0F;
                _, cost_argmin = torch.min(cost[:, anchor_matching_gt > 1], dim=0)
                matching_matrix[:, anchor_matching_gt > 1] *= 0.0
                # &#x635F;&#x5931;&#x6700;&#x5C0F;&#x7684;&#x771F;&#x5B9E;&#x6846;&#x5339;&#x914D;&#x4E0A;&#x8BE5;&#x5019;&#x9009;&#x6846;,&#x5176;&#x4ED6;&#x5730;&#x65B9;&#x7F6E;&#x4E3A;0
                matching_matrix[cost_argmin, anchor_matching_gt > 1] = 1.0
            # &#x6B63;&#x6837;&#x672C;&#x7D22;&#x5F15;
            fg_mask_inboxes = matching_matrix.sum(0) > 0.0
            # &#x6B63;&#x6837;&#x672C;&#x5BF9;&#x5E94;&#x771F;&#x5B9E;&#x6846;&#x7684;&#x7D22;&#x5F15;
            matched_gt_inds = matching_matrix[:, fg_mask_inboxes].argmax(0)

            # &#x6C47;&#x603B;
            from_which_layer = from_which_layer[fg_mask_inboxes]
            all_b = all_b[fg_mask_inboxes]
            all_a = all_a[fg_mask_inboxes]
            all_gj = all_gj[fg_mask_inboxes]
            all_gi = all_gi[fg_mask_inboxes]
            all_anch = all_anch[fg_mask_inboxes]

            # &#x771F;&#x5B9E;&#x6846;
            this_target = this_target[matched_gt_inds]

            # &#x6309;&#x7167;&#x4E09;&#x4E2A;&#x8F93;&#x51FA;&#x5C42;&#x5408;&#x5E76;&#x4FE1;&#x606F;
            for i in range(nl):
                layer_idx = from_which_layer == i
                matching_bs[i].append(all_b[layer_idx])
                matching_as[i].append(all_a[layer_idx])
                matching_gjs[i].append(all_gj[layer_idx])
                matching_gis[i].append(all_gi[layer_idx])
                matching_targets[i].append(this_target[layer_idx])
                matching_anchs[i].append(all_anch[layer_idx])

        # &#x6309;&#x7167;&#x8F93;&#x51FA;&#x5C42;,&#x6C47;&#x603B;&#x6574;&#x4E2A;batch&#x7684;&#x4FE1;&#x606F;
        for i in range(nl):
            if matching_targets[i] != []:
                matching_bs[i] = torch.cat(matching_bs[i], dim=0)
                matching_as[i] = torch.cat(matching_as[i], dim=0)
                matching_gjs[i] = torch.cat(matching_gjs[i], dim=0)
                matching_gis[i] = torch.cat(matching_gis[i], dim=0)
                matching_targets[i] = torch.cat(matching_targets[i], dim=0)
                matching_anchs[i] = torch.cat(matching_anchs[i], dim=0)
            else:
                matching_bs[i] = torch.tensor([], device='cuda:0', dtype=torch.int64)
                matching_as[i] = torch.tensor([], device='cuda:0', dtype=torch.int64)
                matching_gjs[i] = torch.tensor([], device='cuda:0', dtype=torch.int64)
                matching_gis[i] = torch.tensor([], device='cuda:0', dtype=torch.int64)
                matching_targets[i] = torch.tensor([], device='cuda:0', dtype=torch.int64)
                matching_anchs[i] = torch.tensor([], device='cuda:0', dtype=torch.int64)

        return matching_bs, matching_as, matching_gjs, matching_gis, matching_targets, matching_anchs

损失函数:

损失函数与YOLO v5 r6.1版本并无变化,主要包含分类损失、存在物体置信度损失以及边界框回归损失。分类损失和置信度损失均使用交叉熵损失。回归损失使用C-IOU损失。

辅助头网络结构:

当添加辅助输出时,网络结构的输入为1280*1280,第一层需要对图片进行下采样,作者考虑到节省参数,第一层下采样采取间隔采样的做法。

YOLO V7源码解析

代码如下:

class ReOrg(nn.Module):
    def __init__(self):
        super(ReOrg, self).__init__()

    def forward(self, x):  # x(b,c,w,h) -> y(b,4c,w/2,h/2)
        return torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1)

对于辅助输出,一共有八层,对于辅助输出,在损失函数的处理上有所区别,在标签分配上,辅助输出的偏移量为1,因此一共有5个网格,另外,在候选框的初步筛选上,辅助头的top_k为20。这一系列的处理,能够提升辅助头输出的召回率。因此,辅助头更加关注召回率。在损失函数上,辅助头的损失计算用0.25进行缩放。

YOLO V7源码解析

模型的重参数化:

卷积和bn的重参数化:bn层的公式如图所示,可以变换为wx+b的形式

YOLO V7源码解析

因此,可以得到bn层w和b的权重矩阵。

YOLO V7源码解析

将卷积公式带入其中,最终得到融合的权重矩阵和偏差的计算公式。

YOLO V7源码解析

YOLO V7源码解析

对于11和33卷积的参数化,将11的卷积核周围用0填充和33的卷积核相加

YOLO V7源码解析

Original: https://blog.csdn.net/qq_52053775/article/details/127637735
Author: 樱花的浪漫
Title: YOLO V7源码解析

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/658488/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球