PyTorch：目标检测（object detection）介绍

2023年7月9日上午8:55 • 人工智能 • 阅读 57

目标检测（object detection）

一、介绍

在图像分类任务中，我们假设图像中只有一个主要物体对象，我们只关注如何识别其类别。然而，很多时候图像里有多个我们感兴趣的目标，我们不仅想知道它们的类别，还想得到它们在图像中的具体位置。在计算机视觉里，我们将这类任务称为目标检测（object detection）或目标识别（object recognition）。
目标检测所关注的问题：

分类：图片某个区域属于哪个类别；
定位：目标出现在图像中的哪个位置；
大小：目标的大小；

边界框
目标检测中，使用矩形的边界框（bounding box）描述对象的空间位置。图像以左上角为坐标 (0, 0) 点，水平向右为x坐标，垂直向下为y坐标。

%matplotlib inline
import torch
from d2l import torch as d2l

d2l.set_figsize()
img = d2l.plt.imread('../img/catdog.jpg')
d2l.plt.imshow(img);

此时，边界框有两种表示方式：

使用边界框的左上角和右下角在图像中的坐标值来表示
或者使用矩阵的中心坐标和矩形的高和宽来表示

两种表示法之间可以相互转换：


def box_corner_to_center(boxes):
"""
    从（左上坐标，右下坐标）转换到（中心坐标，宽度， 高度）
    Args:
        boxes: tensor, 尺寸(batch_size, 4)
    return:
        boxes: tensor, 尺寸(batch_size, 4)
"""
    x1, y1, x2, y2 = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]

    cx = (x1 + x2) / 2
    cy = (y1 + y2) / 2

    w = x2 - x1
    h = y2 - y1

    boxes = torch.stack((cx, cy, w, h), axis=-1)
    return boxes

def box_center_to_corner(boxes):
"""
    从（中心坐标，宽度， 高度）转换到（左上坐标，右下坐标）
    Args:
        boxes: tensor, 尺寸(batch_size, 4)
    return:
        boxes: tensor, 尺寸(batch_size, 4)
"""
    cx, cy, w, h = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]

    x1 = cx - 0.5 * w
    y1 = cy - 0.5 * h

    x2 = cx + 0.5 * w
    y2 = cy + 0.5 * h

    boxes = torch.stack((x1, y1, x2, y2), axis=-1)
    return boxes

dog_bbox, cat_bbox = [60.0, 45.0, 378.0, 516.0], [400.0, 112.0, 655.0, 493.0]
boxes = torch.tensor((dog_bbox, cat_bbox))
box_center_to_corner(box_corner_to_center(boxes)) == boxes

输出

tensor([[True, True, True, True],
        [True, True, True, True]])

在图像上添加边界框之后，检测对象的主要轮廓基本上在两个框内。


def bbox_to_rect(bbox, color):

    return plt.Rectangle(
        xy=(bbox[0], bbox[1]),
        width=bbox[2]-bbox[0],
        height=bbox[3]-bbox[1],
        fill=False,
        edgecolor=color,
        linewidth=2
    )

fig = plt.imshow(img)
fig.axes.add_patch(bbox_to_rect(dog_bbox, 'blue'))
fig.axes.add_patch(bbox_to_rect(cat_bbox, 'red'));

二、检测区域——锚框

目标检测算法，一般会先抽取候选区域。通常，在输入图像中先采样大量的区域，然后判断这些区域中是否包含要检测的目标，并调整候选区域的边界，从而更加准确的预测目标的真实边界。
这里介绍一种方法：锚框，以每个像素为中心，生成多个不同缩放比和宽高比的边界框。

生成多个锚框
假设输入图像的高度为h h h，宽度为w w w。
对图片的每个像素生成不同形状的锚框：缩放比s ∈ ( 0 , 1 ] s\in(0,1]s ∈(0 ,1 ]，高宽比为r > 0 r>0 r >0。
则s = w a h a w h s=\frac{w_ah_a}{wh}s =w h w a h a 、r = w a h a r=\frac{w_a}{h_a}r =h a w a 条件下锚框的宽度w a w_a w a 和高度h a h_a h a 分别为：w a = r s w h , h a = s w h r w_a=\sqrt{rswh}, h_a=\sqrt{\frac{swh}{r}}w a =r s w h ,h a =r s w h 。

在下面代码中，因为将坐标缩放到了(0, 1)之间，并且最终返回的锚框的坐标也是(0,1)之间的值。因此，在代码中宽和高的计算分别为：
w a s = r s h w w_{as}=\sqrt{\frac{rsh}{w}}w a s =w r s h ，h a s = s w r h h_{as}=\sqrt{\frac{sw}{rh}}h a s =r h s w 。
在绘制时，通过下式得到以像素为单位的宽和高：
w a = w ⋅ r s h w w_{a}=w\cdot\sqrt{\frac{rsh}{w}}w a =w ⋅w r s h ，h a = w ⋅ s w r h h_{a}=w\cdot\sqrt{\frac{sw}{rh}}h a =w ⋅r h s w


def multibox_prior(data, sizes, ratios):
"""
    生成以像素为中心具有不同形状的锚框
    Args:
        data: 形状(批量大小，通道数，高，宽)
        sizes: 1-D sequence, 缩放比
        ratios: 1-D sequence，宽高比
"""

    in_height, in_width = data.shape[-2:]
    device, num_sizes, num_ratios = data.device, len(sizes), len(ratios)

    boxes_per_pixel = (num_sizes+num_ratios-1)
    size_tensor = torch.tensor(sizes, device=device)
    ratio_tensor = torch.tensor(ratios, device=device)

    offset_h, offset_w = 0.5, 0.5

    steps_h = 1.0 / in_height
    steps_w = 1.0 / in_width

    center_h = (torch.arange(in_height, device=device) + offset_h) * steps_h
    center_w = (torch.arange(in_width, device=device) + offset_w) * steps_w

    shift_y, shift_x = torch.meshgrid(center_h, center_w)

    shift_y, shift_x = shift_y.reshape(-1), shift_x.reshape(-1)

    w = torch.cat((torch.sqrt(size_tensor * ratio_tensor[0]*in_height/in_width),
                   torch.sqrt(sizes[0] * ratio_tensor[1:])*in_height/in_width))
    h = torch.cat((torch.sqrt(size_tensor * in_width / (ratio_tensor[0] * in_height)),
                   torch.sqrt(sizes[0]  * in_width / (ratio_tensor[1:] * in_height))))

    anchor_manipulations = torch.stack((-w, -h, w, h)).T.repeat(
                                        in_height * in_width, 1) / 2

    out_grid = torch.stack([shift_x, shift_y, shift_x, shift_y],
                dim=1).repeat_interleave(boxes_per_pixel, dim=0)

    output = out_grid + anchor_manipulations

    return output.unsqueeze(0)

测试定义的函数

d2l.set_figsize()
img = d2l.plt.imread('./img/catdog.jpg')
h, w = img.shape[:2]

print(h, w)
X = torch.rand(size=(1, 3, h, w))
Y = multibox_prior(X, sizes=[0.75, 0.5, 0.25], ratios=[1, 2, 0.5])
boxes = Y.reshape(h, w, 5, 4)
print(boxes[250, 250, :, :])

输出：

tensor([[-0.04, -0.05,  0.72,  0.94],
        [ 0.03,  0.04,  0.65,  0.85],
        [ 0.12,  0.16,  0.56,  0.73],
        [-0.13,  0.10,  0.82,  0.80],
        [ 0.11, -0.25,  0.58,  1.14]])

定义函数，绘制某一个像素为中心的全部锚框

def show_bboxes(axes, bboxes, labels=None, colors=None):
"""
    显示以某个像素为中心的所有边界框
    Args:
        axes:
        bboxes: 单个像素的所有锚框坐标的tensor
        labels: 边界框标签列表
        colors: 边界框颜色列表
"""
    def _make_list(obj, default_values=None):
        if obj is None:
            obj = default_values
        elif not isinstance(obj, (list, tuple)):
            obj = [obj]
        return obj

    labels = _make_list(labels)
    colors = _make_list(colors, ['b', 'g', 'r', 'm', 'c'])
    for i, bbox in enumerate(bboxes):

        color = colors[i % len(colors)]
        rect = bbox_to_rect(bbox.detach().numpy(), color)
        axes.add_patch(rect)

        if labels and len(labels) > i:
            text_color = 'k' if color == 'w' else 'w'
            axes.text(rect.xy[0], rect.xy[1], labels[i],
                      va='center', ha='center', fontsize=9, color=text_color,
                      bbox=dict(facecolor=color, lw=0))

set_figsize()
bbox_scale = torch.tensor((w, h, w, h))
fig = plt.imshow(img)
show_bboxes(fig.axes, boxes[250, 250, :, :] * bbox_scale,
            ['s=0.75, r=1', 's=0.5, r=1', 's=0.25, r=1', 's=0.75, r=2',
             's=0.75, r=0.5'])

输出：

三、计算锚框和真实边框之间的相似性——交并比 IoU

如何量化锚框和真实边界之间的相似性？
杰卡德系数：jaccard，可以衡量两集合A,B之间的相似性，J(A,B) = (A∩B)/(A∪B)。
交并比 (IoU)：将边界框看作为像素的集合，使用杰卡德系数来测量两个边界框的相似性，并将其称之为
交并比 (IoU): 两个边界框相交面积之比，取值范围在0和1之间。


"""
1. 如何量化锚框和真实边界之间的相似性？
    杰卡德系数：jaccard，可以衡量两集合A,B之间的相似性，J(A,B) = (A∩B)/(A∪B)。
    边界框看作为像素的集合，使用杰卡德系数来测量两个边界框的相似性，并将其称之为
    交并比 (IoU): 两个边界框相交面积之比，取值范围在0和1之间。
"""
def box_iou(boxes1, boxes2):
    """计算两个锚框或边界框列表中成对的交并比"""

    box_area = lambda boxes: ((boxes[:, 2] - boxes[:, 0])*
                              (boxes[:, 3] - boxes[:, 1]))

    areas1 = box_area(boxes1)
    areas2 = box_area(boxes2)

    inter_upperlefts = torch.max(boxes1[:, None, :2], boxes2[:, :2])
    inter_lowerrights = torch.min(boxes1[:, None, 2:], boxes2[:, 2:])
    inters = (inter_lowerrights - inter_upperlefts).clamp(min=0)

    inter_areas = inters[:, :, 0] * inters[:, :, 1]
    union_areas = areas1[:, None] + areas2 - inter_areas
    return inter_areas / union_areas

四、在训练数据中标注锚框

在训练集中：
将每个锚框视为一个训练样本，为训练目标检测模型，需要每个锚框的类别（class: 锚框相关的对象类别）和偏移量（offset: 真实边界框相对于锚框的偏移量）标签。

给定图像，假设锚框是A 1 , A 2 , . . . , A n a A_1,A_2,…,A_{na}A 1 ,A 2 ,…,A n a ，真实边界框是B 1 , B 2 , . . . , B n b B_1,B_2,…,B_{nb}B 1 ,B 2 ,…,B n b , 其中n a > = n b n_a >= n_b n a >=n b 。
定义一个矩阵 X ∈ R n a × n b X∈R^ {n_a×n_b}X ∈R n a ×n b ，其中第 i i i 行、第 j j j 列的元素 x i j x_{ij}x i j 是锚框 A i A_i A i 和真实边界框 B j B_j B j 的IoU。
算法步骤：

在矩阵X X X中找到最大的元素，并将其的行索引和列索引分别表示为i 1 i_1 i 1 和j 1 j_1 j 1 。然后将真实边界框B j 1 B_{j_1}B j 1 分配给锚框A i 1 A_{i_1}A i 1 。因为A i 1 A_{i_1}A i 1 和B j 1 B_{j_1}B j 1 是所有锚框和真实边界框配对中最相近的。在第一个分配完成后,丢弃矩阵中行和列中的所有元素。
在矩阵X X X中找到剩余元素中最大的元素，并将它的行索引和列索引分别表示为i 2 i_2 i 2 和j 2 j_2 j 2 。我们将真实边界框A i 2 A_{i_2}A i 2 分配给锚框B i 2 B_{i_2}B i 2 ，并丢弃矩阵中行i 2 t h i_{2th}i 2 t h 和列j 2 t h j_{2th}j 2 t h 中的所有元素。
此时，矩阵中两行和两列中的元素已被丢弃。继续前序操作，直到丢弃掉矩阵X X X中n b n_b n b 列中的所有元素。此时，已经为n b n_b n b 个锚框各自分配了一个真实边界框。
只遍历剩下的n a − n b n_a-n_b n a −n b 个锚框。例如，给定任何锚框A i A_i A i ，在矩阵X X X的第i t h i_{th}i t h 行中找到与A i A_i A i 的IoU最大的真实边界框B j B_j B j ，只有当此IoU大于预定义的阈值时，才将B j B_j B j 分配给A j A_j A j 。


def assign_anchor_to_bbox(ground_truth, anchor, device, iou_threshold=0.5):
    """将最接近的真实边界框分配给锚框"""

    num_anchors, num_gt_boxes = anchors.shape[0], ground_truth.shape[0]

    jaccard = box_iou(anchors, ground_truth)

    anchors_bbox_map = torch.full((num_anchors,), -1, dtype=torch.long,
                                  device=device)

    max_ious, indices = torch.max(jaccard, dim=1)

    anc_i = torch.nonzero(max_ious >= 0.5).reshape(-1)

    box_j = indices[max_ious >= 0.5]

    anchors_bbox_map[anc_i] = box_j

    col_discard = torch.full((num_anchors,), -1)
    row_discard = torch.full((num_gt_boxes,), -1)
    for _ in range(num_gt_boxes):

        max_idx = torch.argmax(jaccard)
        box_idx = (max_idx % num_gt_boxes).long()
        anc_idx = (max_idx / num_gt_boxes).long()

        anchors_bbox_map[anc_idx] = box_idx

        jaccard[:, box_idx] = col_discard
        jaccard[anc_idx, :] = row_discard

    return anchors_bbox_map

在为每个锚框分配了真实边界框，也就为锚框标记了分类类别，。之后，还需要对其标记与对应真实边界框之间的偏移。
标记偏移量
为每个锚框标记类别和偏移量，假设一个锚框A被分配给了一个真实边界框B。则锚框A的类别将被标记为与B相同，并且锚框A的偏移量将根据B和A中心坐标的相对位置以及这两个框的大小进行标记。直觉来说，可以直接使用锚框和真实边框之间的坐标差来标记偏移。但是这样的标记不利于学习。因此，将同时基于数据集内不同的框的位置和大小不同，应用变换来获得均匀拟合的偏移量。给定框A和B，中心坐标分别为( x a , y a ) (x_a, y_a)(x a ,y a )和( x b , y b ) (x_b,y_b)(x b ,y b )，宽度分别为ω a \omega_a ωa 和ω b \omega_b ωb ,高度分别为h a h_a h a 和h b h_b h b ，则偏移量可以标记为：
( x b − x a w a − μ x σ x , y b − y a h a − μ y σ y , l o g w b w a − μ w σ w , l o g h b h a − μ h σ h ) (\frac{\frac{x_b-x_a}{w_a} – \mu_x}{\sigma_x}, \frac{\frac{y_b-y_a}{h_a} – \mu_y}{\sigma_y}, \frac{log \frac{w_b}{w_a} – \mu_w}{\sigma_w},\frac{log \frac{h_b}{h_a} – \mu_h}{\sigma_h})(σx w a x b −x a −μx ,σy h a y b −y a −μy ,σw l o g w a w b −μw ,σh l o g h a h b −μh )
其中常量的默认值为μ x = μ y = μ w = μ h = 0 \mu_x=\mu_y=\mu_w=\mu_h=0 μx =μy =μw =μh =0, σ x = σ y = 0.1 , σ w = σ h = 0.2 \sigma_x=\sigma_y=0.1, \sigma_w=\sigma_h=0.2 σx =σy =0 .1 ,σw =σh =0 .2。


def offset_boxes(anchors, assigned_bb, eps=1e-6):
    """对锚框偏移量的转换"""

    c_anc = d2l.box_corner_to_center(anchors)
    c_assigned_bb = d2l.box_corner_to_center(assigned_bb)

    offset_xy = 10 * (c_assigned_bb[:, :2] - c_anc[:, :2]) / c_anc[:, 2:]
    offset_wh = 5 * torch.log(eps + c_assigned_bb[:, 2:] / c_anc[:, 2:])

    offset = torch.cat([offset_xy, offset_wh], axis=1)
    return offset

如果一个锚框没有被分配真实边界框，我们只需将锚框的类别标记为”背景”（background）。背景类别的锚框通常被称为”负类”锚框，其余的被称为”正类”锚框。我们使用真实边界框（labels参数）实现以下multibox_target函数，来标记锚框的类别和偏移量（anchors参数）。此函数将背景类别的索引设置为零，然后将新类别的整数索引递增一。


def multibox_target(anchors, labels):
    """使用真实边界框标记锚框"""

    batch_size, anchors = labels.shape[0], anchors.squeeze(0)
    batch_offset, batch_mask, batch_class_labels = [], [], []
    device, num_anchors = anchors.device, anchors.shape[0]

    for i in range(batch_size):

        label = labels[i, :, :]
        anchors_bbox_map = assign_anchor_to_bbox(
            label[:, 1:], anchors, device)

        bbox_mask = ((anchors_bbox_map >= 0).float().unsqueeze(-1)).repeat(
            1, 4)

        class_labels = torch.zeros(num_anchors, dtype=torch.long,
                                   device=device)
        assigned_bb = torch.zeros((num_anchors, 4), dtype=torch.float32,
                                  device=device)

        indices_true = torch.nonzero(anchors_bbox_map >= 0)
        bb_idx = anchors_bbox_map[indices_true]
        class_labels[indices_true] = label[bb_idx, 0].long() + 1
        assigned_bb[indices_true] = label[bb_idx, 1:]

        offset = offset_boxes(anchors, assigned_bb) * bbox_mask
        batch_offset.append(offset.reshape(-1))
        batch_mask.append(bbox_mask.reshape(-1))
        batch_class_labels.append(class_labels)
    bbox_offset = torch.stack(batch_offset)
    bbox_mask = torch.stack(batch_mask)
    class_labels = torch.stack(batch_class_labels)
    return (bbox_offset, bbox_mask, class_labels)

测试：
定义一个图片即真实标签和锚框：

ground_truth = torch.tensor([[0, 0.1, 0.08, 0.52, 0.92],
                         [1, 0.55, 0.2, 0.9, 0.88]])
anchors = torch.tensor([[0, 0.1, 0.2, 0.3], [0.15, 0.2, 0.4, 0.4],
                    [0.63, 0.05, 0.88, 0.98], [0.66, 0.45, 0.8, 0.8],
                    [0.57, 0.3, 0.92, 0.9]])

fig = d2l.plt.imshow(img)
show_bboxes(fig.axes, ground_truth[:, 1:] * bbox_scale, ['dog', 'cat'], 'k')
show_bboxes(fig.axes, anchors * bbox_scale, ['0', '1', '2', '3', '4']);

测试定义的 multibox_target函数：

labels = multibox_target(anchors.unsqueeze(dim=0),
                         ground_truth.unsqueeze(dim=0))

for i in labels:
    print(i)

输入：

tensor([[-0.00e+00, -0.00e+00, -0.00e+00, -0.00e+00,  1.40e+00,  1.00e+01,
          2.59e+00,  7.18e+00, -1.20e+00,  2.69e-01,  1.68e+00, -1.57e+00,
         -0.00e+00, -0.00e+00, -0.00e+00, -0.00e+00, -5.71e-01, -1.00e+00,
          4.17e-06,  6.26e-01]])
tensor([[0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 1., 1.,
         1., 1.]])
tensor([[0, 1, 2, 0, 2]])

labels 中第一个元素是锚框标记的四个偏移量，形状 (batch_size, 4num_anchors)；第二个元素，掩码变量，size(批量大小， 4num_anchors)，掩码中的元素和锚框的四个偏移量一一对应，负类的偏移量不应影响目标函数，故对应掩码为0；第三个元素，预测的类别，形状(batch_size, num_anchors)。

五、非极大值抑制预测边界框

在预测时：为每个图像生成多个锚框，预测所有锚框的类别和偏移量，根据预测的偏移量调整它们的位置获得预测的边界框，最后，只输出符合特定条件的预测边界框。
预测时，先为图片生成多个锚框，然后经模型预测出带有预测偏移量的锚框，然后就可以进一步转化出预测的边界框。转化过程是上面介绍的偏移变换的逆过程。

def offset_inverse(anchors, offset_preds):
    """根据带有预测偏移量的锚框来预测边界框"""
    anc = d2l.box_corner_to_center(anchors)
    pred_bbox_xy = (offset_preds[:, :2] * anc[:, 2:] / 10) + anc[:, :2]
    pred_bbox_wh = torch.exp(offset_preds[:, 2:] / 5) * anc[:, 2:]
    pred_bbox = torch.cat((pred_bbox_xy, pred_bbox_wh), axis=1)
    predicted_bbox = d2l.box_center_to_corner(pred_bbox)
    return predicted_bbox

当有许多锚框时，可能会输出许多相似的具有明显重叠的预测边界框，都围绕着同一目标。为了简化输出，使用极大值抑制 (non-maximum suppression, NMS)合并属于同一目标的类似的预测边界框。
非极大值抑制：对于一个预测边界框B，目标检测模型会计算每个类别的预测概率。假设最大的预测概率为p，则该概率所对应的类别B即为预测的类别。具体来说，我们将p作为预测边界框B的置信度(confidence)。在所有预测的非背景边界框都按照置信度降序排列，生成列表L。然后通过以下步骤操作列表L：

从L中选取置信度最高的预测边界框B 1 B_1 B 1 作为基准，然后将所有与B 1 B_1 B 1 的IoU超过预定阈值ϵ \epsilon ϵ的非基准预测边界框从L中移除。这时，L保留了置信度最高的预测边界框，去除了与其太过相似的其它预测边界框。简而言之，这些具有非极大值置信度的边界框被抑制了。
从L中选取置信度第二高的预测边界框B 2 B_2 B 2 作为又一个基准，然后将所有与B 2 B_2 B 2 的IoU大于ϵ \epsilon ϵ的非基准预测边界框从L中移除。
重复上述过程，直到L中所有预测边界框都曾被作为基准。此时，L中任意一对预测边界框的IoU都小于阈值ϵ \epsilon ϵ，不会过于相似。
输出列表L中的所有预测边界框。

def nms(boxes, scores, iou_threshold):
    """对预测边界框的置信度进行排序"""
    B = torch.argsort(scores, dim=-1, descending=True)
    keep = []
    while B.numel() > 0:
        i = B[0]
        keep.append(i)
        if B.numel() == 1: break
        iou = box_iou(boxes[i, :].reshape(-1, 4),
                      boxes[B[1:], :].reshape(-1, 4)).reshape(-1)
        inds = torch.nonzero(iou  iou_threshold).reshape(-1)
        B = B[inds + 1]
    return torch.tensor(keep, device=boxes.device)

def multibox_detection(cls_probs, offset_preds, anchors, nms_threshold=0.5,
                       pos_threshold=0.009999999):
    """使用非极大值抑制来预测边界框"""
    device, batch_size = cls_probs.device, cls_probs.shape[0]
    anchors = anchors.squeeze(0)
    num_classes, num_anchors = cls_probs.shape[1], cls_probs.shape[2]
    out = []
    for i in range(batch_size):
        cls_prob, offset_pred = cls_probs[i], offset_preds[i].reshape(-1, 4)
        conf, class_id = torch.max(cls_prob[1:], 0)
        predicted_bb = offset_inverse(anchors, offset_pred)
        keep = nms(predicted_bb, conf, nms_threshold)

        all_idx = torch.arange(num_anchors, dtype=torch.long, device=device)
        combined = torch.cat((keep, all_idx))
        uniques, counts = combined.unique(return_counts=True)
        non_keep = uniques[counts == 1]
        all_id_sorted = torch.cat((keep, non_keep))
        class_id[non_keep] = -1
        class_id = class_id[all_id_sorted]
        conf, predicted_bb = conf[all_id_sorted], predicted_bb[all_id_sorted]

        below_min_idx = (conf < pos_threshold)
        class_id[below_min_idx] = -1
        conf[below_min_idx] = 1 - conf[below_min_idx]
        pred_info = torch.cat((class_id.unsqueeze(1),
                               conf.unsqueeze(1),
                               predicted_bb), dim=1)
        out.append(pred_info)
    return torch.stack(out)

举例：

anchors = torch.tensor([[0.1, 0.08, 0.52, 0.92], [0.08, 0.2, 0.56, 0.95],
                      [0.15, 0.3, 0.62, 0.91], [0.55, 0.2, 0.9, 0.88]])
offset_preds = torch.tensor([0] * anchors.numel())
cls_probs = torch.tensor([[0] * 4,
                      [0.9, 0.8, 0.7, 0.1],
                      [0.1, 0.2, 0.3, 0.9]])

fig = plt.imshow(img)
show_bboxes(fig.axes, anchors * bbox_scale,
            ['dog=0.9', 'dog=0.8', 'dog=0.7', 'cat=0.9'])

可以调用multibox_detection函数来执行非极大值抑制，其中阈值设置为0.5。

output = multibox_detection(cls_probs.unsqueeze(dim=0),
                            offset_preds.unsqueeze(dim=0),
                            anchors.unsqueeze(dim=0),
                            nms_threshold=0.5)

fig = plt.imshow(img)
for i in output[0].detach().numpy():
    if i[0] == -1:
        continue
    label = ('dog=', 'cat=')[int(i[0])] + str(i[1])
    show_bboxes(fig.axes, [torch.tensor(i[2:]) * bbox_scale], label)

参考：动手学深度学习

Original: https://blog.csdn.net/weixin_43276033/article/details/124623669
Author: 峡谷的小鱼
Title: PyTorch：目标检测（object detection）介绍

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/680363/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

十三、聚类算法

六、聚类算法实战一、聚类聚类是一种无监督的机器学习任务，可以自动将数据划分为类cluster，因此聚类分组不需要提前被告知所划分的组应该是什么样子的。因为我们甚至可能都不知道我…

人工智能 2023年5月31日
0099
【PyTorch深度学习项目实战100例】—— 基于Transformer实现100项体育运动分类 | 第48例

; 前言大家好，我是阿光。本专栏整理了《PyTorch深度学习项目实战100例》，内包含了各种不同的深度学习项目，包含项目原理以及源码，每一个项目实例都附带有完整的代码+数据集…

人工智能 2023年6月30日
0074
cv2.error: OpenCV(4.5.5) D:aopencv-pythonopencv-pythonopencvmodulescoresrcarithm.cpp:650: er

星光不问赶路人，时光不负有心人。在训练模型时，读取图片数据的维度不是三维的，图片数据通道数正常是3，但训练时候，通道数有10、20等，后面打印img，发现最后通道数补充的都是0，…

人工智能 2023年7月5日
00102
MS COCO数据集

1. MS COCO数据集介绍 MS COCO的全称是Microsoft Common Objects in Context，起源于微软于2014年出资标注的Microsoft C…

人工智能 2023年6月23日
0059
python 读取netcdf4文件的全过程（基础教程，看了包会）

python 读取netcdf4（nc）文件的完整教学过程，再基础不过拿到一个nc文件，如何使用python进行读取呢？本文带你完整走一遍流程。前期准备： xarray &amp…

人工智能 2023年7月5日
0095
Numpy安装、升级与卸载

文章目录前言 1. 从Ubuntu仓库中安装numpy * 1.1 使用sudo apt install 方法 1.2 使用pip install方法 2. 升级Numpy 3….

人工智能 2023年6月16日
0091
Ubuntu下目标检测YOLO系列网络安装OpenCV时Darknet编译出现的问题（pjreddie版本）

个人觉得ubuntu系统单纯编译安装OpenCV麻烦，容易出错，使用pip快速安装安装前提是Anaconda3安装完成在darknet-master项目下进入虚拟环境终端进行安装（…

人工智能 2023年7月19日
0052
(十三) minAreaRect函数

minAreaRect函数 * – 1.背景 – 2.minAreaRect 函数 – 3.以新版为例 – 参考资料欢迎访问个人网…

人工智能 2023年7月4日
0063
R语言Box-Cox变换实战（Box-Cox Transformation）：将非正态分布数据转换为正态分布数据、计算最佳λ、变换后构建模型

抵扣说明： 1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。2.余额无法直接购买下载，可以购买VIP、C币套餐、付费专栏及课程。 Original: https:…

人工智能 2023年7月17日
0039
VAD检测原理及其过程

顾名思义，VAD（Voice Activity …

人工智能 2023年5月27日
0059
Jetson nx（Jetpack4.6版本）保姆级教程安装安装d435i-SDK流程踩坑，以及安装realsense与code_utils与imu_utils并标定imu内参附上测试demo

不知道啥时候自己能买得起这个相机，拍照留个纪念。 NVIDIA Jetson installation (intelrealsense.com) 固件跟新地址Firmware Up…

人工智能 2023年6月25日
0079
利用Python进行股票交易分析（三）：A股量化交易策略的验证及数据分析。

鉴于近期空闲时间比较少，本篇文章采用不定时更新的方式来写，如大家有更好的思路也可以评论区一起讨论…. 目前进度： 2021-07-13 梳理、修改思路 2021-07-…

人工智能 2023年6月11日
00109
图像处理深度学习经典基础算法

目录前言一、算法实现效果二、相关算法的一个个人理解 1、LeNet-5体系结构构： 2、AlexNet网络结构： 3、GoogLeNet网络结构： 4、VGG网络结构（VGG…

人工智能 2023年7月13日
0057
Rethinking Graph Convolutional Networks in Knowledge Graph Completion

研究内容许多基于GCN的知识图补全（ KGC ）模型首先使用GCN聚合邻居实体和关系信息生成实体和关系的表示，然后使用知识图嵌入（KGE）模型捕获实体和关系之间的交互。这一操作虽…

人工智能 2023年6月10日
0076
windows10下安装pycocotools（亲测有效）

不来花里胡哨的，直接本地安装 1.打开清华源pycocotools的链接：https://pypi.tuna.tsinghua.edu.cn/simple/pycocotools-…

人工智能 2023年7月12日
0057
BP 、SVM、 PNN分类预测

### 回答1： SVM_多 _分类预测_是指在 _SVM 算法_的基础上，对于多个类别的 _分类_任务进行 _预测。 MATLAB_作为一种常用的科学计算软件，也提供了相应的 …

人工智能 2023年7月2日
00196

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

PyTorch： 目标检测（object detection）介绍

一、 介绍