yolov5的onnx推断示例和思路记录（包含detect.py的最新源码解读）

2023年7月21日上午5:43 • 人工智能 • 阅读 101

最近把yolov5的模型导出为了onnx格式，想写一个脚本来验证一下结果，看看和直接使用pt文件进行推断有无出入，虽然官方在detect.py文件里可以针对各种模型格式直接进行推断，但用起来总感觉不懂其中奥妙，同时官方的detect.py里引入了太多外部库，所以就有了单独写一个推断脚本的想法。当然，如果仅仅是输入一个空向量进行推断，是很简单的，但我希望能对图片直接进行预测并保存。

2022/09/07 github上更新了cpu版本的推断。

首先肯定是要装好onnxruntime的，我这里是 gpu版本的onnxruntime，还有torch，torchvision，opencv-python以及相应的cuda工具包，这里就不赘述了。
思路为以下：
1.对图片进行预处理，转换为onnx模型的输入尺寸。
2.进行推断得到所有框之后，使用non_max_suppression去掉所有不符合条件的框，也就是根据confidence和iou分数来去掉分数不够的框和重叠多余的框。
3.把框框的坐标转换为原始图片尺寸的坐标（因为这个图片已经被预处理转换尺寸过了）
4.根据坐标以及标签名称在原始图片上进行标注并保存（使用opencv和cv2.putText方法）

首先让我们看看detect.py里面是怎么写的，然后定位我们完成任务需要的东西。

1.加载onnx模型


device = select_device(device)
model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data, fp16=half)
stride, names, pt = model.stride, model.names, model.pt
imgsz = check_img_size(imgsz, s=stride)

第一行device，很明显是在问你是用cpu还是gpu跑呢，因为我们有cuda，所以肯定是用cuda来跑，那么这个device就要改成cuda。
第二行DetectMultiBackend，yolov5的最新源码（2022.8月份）中是通过这个函数实现多模型格式的装载，这个和以前是不一样的，你在网上找到的一些2021年的yolov5项目中，这一段可能是attempt_load函数。所以需要具体细看这个函数里面到底是怎么实现的。
第三行就是定义模型的一些基本信息，stride一般都是32，names就是模型的标签名，pt就是你是不是用pytorch的pt权重来进行推断，我们既然是onnx，那pt=False，这个很重要。
第四行是检查输入图片的尺寸是否是32的倍数，这个是yolov5训练的时候就会这么要求的，必须为32的倍数，如果不是要进行尺寸转换。
然后让我们细看一下这个DetectMultiBackend:


elif onnx:
     LOGGER.info(f'Loading {w} for ONNX Runtime inference...')
     cuda = torch.cuda.is_available()
     check_requirements(('onnx', 'onnxruntime-gpu' if cuda else 'onnxruntime'))
     import onnxruntime
     providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] if cuda else ['CPUExecutionProvider']
     session = onnxruntime.InferenceSession(w, providers=providers)
     meta = session.get_modelmeta().custom_metadata_map
     if 'stride' in meta:
         stride, names = int(meta['stride']), eval(meta['names'])

这个函数位于 models文件夹下的 common.py文件里，因为前面过长就不放了，我们只看感兴趣的onnx部分，
首先是判断gpu版本的onnxruntime有没有装，cuda能不能用，如果能用就使用’CUDAExecutionProvider’来进行推断，所以这一段代码就是用来加载onnx模型的。

2.对图片进行预处理


if webcam:
        view_img = check_imshow()
        cudnn.benchmark = True
        dataset = LoadStreams(source, img_size=imgsz, stride=stride, auto=pt)
        bs = len(dataset)
    else:
        dataset = LoadImages(source, img_size=imgsz, stride=stride, auto=pt)
        bs = 1

因为我们是只对单张图片进行推断，所以肯定是使用了else循环下的 LoadImages函数，这里涉及到两个参数 img_size和 auto：
img_size是onnx模型的输入尺寸，一般来说，默认导出的是(640,640)，这个值在你导出onnx模型的时候是可以修改的。
auto是你是不是用了pt，我们是onnx，所以auto应该等于False。
接下来看这个函数的具体实现（在utils下的dataloaders.py里）：


else:

    self.count += 1
    img0 = cv2.imread(path)
    assert img0 is not None, f'Image Not Found {path}'
    s = f'image {self.count}/{self.nf}{path}: '

img = letterbox(img0, self.img_size, stride=self.stride, auto=self.auto)[0]

img = img.transpose((2, 0, 1))[::-1]
img = np.ascontiguousarray(img)

因为我们只有一张图片，所以直接跳到else部分，这里面先用 cv2.imread读取了图片，再用 letterbox转换成了640×640的图片，再进行维度转换，最后成为一个array数组，所以我们自己写的时候只需要把letterbox方法（在 utils的 augmentations.py下）合理的copy下来就可以成功的预处理了。

3.进行推断


im = torch.from_numpy(im).to(device)
im = im.half() if model.fp16 else im.float()
im /= 255
if len(im.shape) == 3:
   im = im[None]
t2 = time_sync()
dt[0] += t2 - t1

visualize = increment_path(save_dir / Path(path).stem, mkdir=True) if visualize else False
pred = model(im, augment=augment, visualize=visualize)

终于到推断部分了，第一行首先是把数组变成tensor好进行接下来的处理，第二行因为我们是用cuda进行onnx的推断，所以在这里我们要用 im.float()，接下来四行都没用，跳过，最后一句为正式推断，那么到底发生了啥呢，让我们跳到 models下的 common.py：


elif self.onnx:
     im = im.cpu().numpy()
     y = self.session.run([self.session.get_outputs()[0].name], {self.session.get_inputs()[0].name: im})[0]

其实很简单，就两句话，把tensor又转成了numpy，然后输入onnx进行推断。

4.去掉多余的框

我这边用的 yolov5-s，在第三步结尾会得到一个(1,25200,7)的向量，也就是可以理解为25200个框，其中包含了四个坐标点，1个置信分数，1个分类的信息。所以肯定要把多余的框全部去掉，这个在detect.py里的实现很简单，就一句话：


pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)

这里pred是第三步得到的预测， conf_thres默认值为0.25， iou_thres默认值为0.45，小于前面两个值的框都会被去掉， classes是你指定要对哪一类进行预测，默认是false， agnostic_nms默认是false，应该是一种别的nms的方法，这里就不细讲了，有挺多不同nms的方法，max_det默认是300，应该表示最多的框数。
然后这个函数是在 utils的 general.py里，具体实现有点复杂，我们可以按需删除我们不需要的功能，比如classes一般用不着，相关的就可以删掉。


box = xywh2xyxy(x[:, :4])
iou = box_iou(boxes[i], boxes)

我之所以摘出这两个，是因为它调用了别的文件里的函数，第一个是用来把x,y,w,h的图片坐标转换为x,y,x,y的坐标系，第二个是用来计算框框的iou分数， 其中xywh2xyxy也位于general.py下，而box_iou则位于utils里的metrics.py里。

5.对图片标注并保存


for i, det in enumerate(pred):
    det[:, :4] = scale_coords(im.shape[2:], det[:, :4], img0.shape).round()

annotator = Annotator(img0, line_width=3)

for *xyxy, conf, cls in reversed(det):
    c = int(cls)
    label = f'{names[c]}{conf:.2f}'
    annotator.box_label(xyxy, label, color=colors(c, True))

这里我略作了修改，看的更清楚。
第一步就是把第四步得到的tensor里面的坐标转换为原始图片的尺寸，使用了 scale_coords这个函数，然后标注使用的是 Annotator这个类，在你定义好了labels之后就可以使用Annotator进行标注并保存。
其中 scale_coords在utils的 general.py文件下， Annotator在utils的 plots.py下。

还是得多看源码才能提升功力，用着别人的源码只能当个调参侠，继续努力吧！

在经过上面一番阅读和漫长的debug之后，我们得到了以下脚本（有点长）：


import onnxruntime
import torch
import torchvision
import cv2
import numpy as np
import time
w = 'best.onnx'
cuda = torch.cuda.is_available()
providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] if cuda else ['CPUExecutionProvider']
session = onnxruntime.InferenceSession(w, providers=providers)

def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):

    shape = im.shape[:2]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)

    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
    if not scaleup:
        r = min(r, 1.0)

    ratio = r, r
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]
    if auto:
        dw, dh = np.mod(dw, stride), np.mod(dh, stride)
    elif scaleFill:
        dw, dh = 0.0, 0.0
        new_unpad = (new_shape[1], new_shape[0])
        ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]

    dw /= 2
    dh /= 2

    if shape[::-1] != new_unpad:
        im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)
    return im
def xywh2xyxy(x):

    y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
    y[:, 0] = x[:, 0] - x[:, 2] / 2
    y[:, 1] = x[:, 1] - x[:, 3] / 2
    y[:, 2] = x[:, 0] + x[:, 2] / 2
    y[:, 3] = x[:, 1] + x[:, 3] / 2
    return y
def box_area(box):

    return (box[2] - box[0]) * (box[3] - box[1])
def box_iou(box1, box2, eps=1e-7):

    (a1, a2), (b1, b2) = box1[:, None].chunk(2, 2), box2.chunk(2, 1)
    inter = (torch.min(a2, b2) - torch.max(a1, b1)).clamp(0).prod(2)

    return inter / (box_area(box1.T)[:, None] + box_area(box2.T) - inter + eps)
def non_max_suppression(prediction,
                        conf_thres=0.25,
                        iou_thres=0.45,
                        agnostic=False,
                        max_det=300):
    bs = prediction.shape[0]
    xc = prediction[..., 4] > conf_thres

    max_wh = 7680
    max_nms = 30000
    redundant = True
    merge = False
    output = [torch.zeros((0, 6), device = prediction.device)] * bs
    for xi, x in enumerate(prediction):

        x = x[xc[xi]]

        if not x.shape[0]:
            continue

        x[:, 5:] *= x[:, 4:5]

        box = xywh2xyxy(x[:, :4])

        conf, j = x[:, 5:].max(1, keepdim=True)
        x = torch.cat((box, conf, j.float()), 1)[conf.view(-1) > conf_thres]

        n = x.shape[0]
        if not n:
            continue
        elif n > max_nms:
            x = x[x[:, 4].argsort(descending=True)[:max_nms]]

        c = x[:, 5:6] * (0 if agnostic else max_wh)
        boxes, scores = x[:, :4] + c, x[:, 4]
        i = torchvision.ops.nms(boxes, scores, iou_thres)
        if i.shape[0] > max_det:
            i = i[:max_det]
        if merge and (1 < n < 3E3):

            iou = box_iou(boxes[i], boxes) > iou_thres
            weights = iou * scores[None]
            x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True)
            if redundant:
                i = i[iou.sum(1) > 1]

        output[xi] = x[i]
    return output
def scale_coords(img1_shape, coords, img0_shape, ratio_pad=None):

    if ratio_pad is None:
        gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1])
        pad = (img1_shape[1] - img0_shape[1] * gain) / 2, (img1_shape[0] - img0_shape[0] * gain) / 2
    else:
        gain = ratio_pad[0][0]
        pad = ratio_pad[1]

    coords[:, [0, 2]] -= pad[0]
    coords[:, [1, 3]] -= pad[1]
    coords[:, :4] /= gain
    clip_coords(coords, img0_shape)
    return coords
def clip_coords(boxes, shape):

    if isinstance(boxes, torch.Tensor):
        boxes[:, 0].clamp_(0, shape[1])
        boxes[:, 1].clamp_(0, shape[0])
        boxes[:, 2].clamp_(0, shape[1])
        boxes[:, 3].clamp_(0, shape[0])
    else:
        boxes[:, [0, 2]] = boxes[:, [0, 2]].clip(0, shape[1])
        boxes[:, [1, 3]] = boxes[:, [1, 3]].clip(0, shape[0])
class Annotator:
    def __init__(self, im, line_width=None):
        assert im.data.contiguous, 'Image not contiguous. Apply np.ascontiguousarray(im) to Annotator() input images.'
        self.im = im
        self.lw = line_width or max(round(sum(im.shape) / 2 * 0.003), 2)

    def box_label(self, box, label='', color=(128, 128, 128), txt_color=(255, 255, 255)):

        p1, p2 = (int(box[0]), int(box[1])), (int(box[2]), int(box[3]))
        cv2.rectangle(self.im, p1, p2, color, thickness=self.lw, lineType=cv2.LINE_AA)
        if label:
            tf = max(self.lw - 1, 1)
            w, h = cv2.getTextSize(label, 0, fontScale=self.lw / 3, thickness=tf)[0]
            outside = p1[1] - h >= 3
            p2 = p1[0] + w, p1[1] - h - 3 if outside else p1[1] + h + 3
            cv2.rectangle(self.im, p1, p2, color, -1, cv2.LINE_AA)
            cv2.putText(self.im,
                        label, (p1[0], p1[1] - 2 if outside else p1[1] + h + 2),
                        0,
                        self.lw / 3,
                        txt_color,
                        thickness=tf,
                        lineType=cv2.LINE_AA)

    def rectangle(self, xy, fill=None, outline=None, width=1):

        self.draw.rectangle(xy, fill, outline, width)

    def text(self, xy, text, txt_color=(255, 255, 255)):

        w, h = self.font.getsize(text)
        self.draw.text((xy[0], xy[1] - h + 1), text, fill=txt_color, font=self.font)

    def result(self):

        return np.asarray(self.im)
class Colors:
    def __init__(self):

        hexs = ('FF3838', 'FF9D97', 'FF701F', 'FFB21D', 'CFD231', '48F90A', '92CC17', '3DDB86', '1A9334', '00D4BB',
                '2C99A8', '00C2FF', '344593', '6473FF', '0018EC', '8438FF', '520085', 'CB38FF', 'FF95C8', 'FF37C7')
        self.palette = [self.hex2rgb(f'#{c}') for c in hexs]
        self.n = len(self.palette)

    def __call__(self, i, bgr=False):
        c = self.palette[int(i) % self.n]
        return (c[2], c[1], c[0]) if bgr else c

    @staticmethod
    def hex2rgb(h):
        return tuple(int(h[1 + i:1 + i + 2], 16) for i in (0, 2, 4))

colors = Colors()
img0 = cv2.imread('test.png')
img = letterbox(img0, (640,640), stride=32, auto=False)
img = img.transpose((2, 0, 1))[::-1]
img = np.ascontiguousarray(img)
im = torch.from_numpy(img).to(torch.device('cuda'))
im = im.float()
im /= 255
if len(im.shape) == 3:
    im = im[None]
im = im.cpu().numpy()
y = session.run([session.get_outputs()[0].name], {session.get_inputs()[0].name: im})[0]

y = torch.from_numpy(y).to(torch.device('cuda'))
pred = non_max_suppression(y, conf_thres = 0.25, iou_thres = 0.45, agnostic= False, max_det=1000)

for i, det in enumerate(pred):
    det[:, :4] = scale_coords(im.shape[2:], det[:, :4], img0.shape).round()
print(det)

names = ['nofall', 'fall']

annotator = Annotator(img0, line_width=3)

for *xyxy, conf, cls in reversed(det):
    c = int(cls)
    label = f'{names[c]}{conf:.2f}'
    annotator.box_label(xyxy, label, color=colors(c, True))

cv2.imwrite('test.png', img0)

这里说一下warmup部分，yolov5的detect.py会先给模型传入一个空向量来预加载，这样正式预测的时候延时就会变小，我这测试传入空向量的时间是0.73s，正式预测是0.01s。
但是如果不用warmup直接进行预测，时间也就等于以上两者总和，同时从第二次预测开始也都是0.01s起步，所以这个功能在我这个场景下好像没啥用就注释掉了。

Original: https://blog.csdn.net/weixin_43945848/article/details/126503453
Author: XINFINFZ
Title: yolov5的onnx推断示例和思路记录（包含detect.py的最新源码解读）

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/706457/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

基于卷积神经网络的实体关系抽取（SemEval-2010 Task-8数据集）

摘要关系抽取旨在识别命名实体之间的语义关系.作为自然语言处理中信息抽取的重要子任务,是构建知识图谱,实现语义搜索,建立智能问答系统等应用领域必不可少的关键技术,具有极其重要的研究…

人工智能 2023年6月10日
0072
nlp-with-transformers实战-01_transformers简介

原文：https://www.oreilly.com/library/view/natural-language-processing/9781098103231/ch01.htm…

人工智能 2023年5月30日
0072
红外图像和可见光图像异源图像配准问题研究

提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录前言一、配准的基本原理 * 1.1 常用的配准方法 1.2 配准流程 1.3 图像预处理 &#8211…

人工智能 2023年6月23日
0096
Apollo学习笔记（17）卡尔曼滤波

卡尔曼滤波（KF）和扩展卡尔曼滤波（EKF）是最常用的滤波器优化算法，本文主要是介绍一下卡尔曼滤波，从国外的一篇博客翻译过来的，如果有翻译的不当的地方，烦请指出，最后会给出原文链接…

人工智能 2023年6月11日
0076
如何在 R 中执行 Wald 测试

Wald 检验可用于测试模型中的一个或多个参数是否等于某些值。此检验通常用于确定回归模型中的一个或多个预测变量是否等于零。我们对此测试使用以下无效假设和替代假设： H 0：一些…

人工智能 2023年6月18日
0064
【UV打印机】电气之光电传感器

00. 目录文章目录 * – 00. 目录 – 01. 概述 – 02. 应用场景 – 03. 检测方式 – 04. …

人工智能 2023年6月27日
0087
pandas中如何选取某几列_【python】pandas中 loc & iloc用法及区别

在刚学习Python的时候，对于loc、iloc、at、iat、ix有点混乱，没有进行过整理和梳理。所以针对这几种用法进行一次案例的整理。本次优先整理loc和iloc SQL中的s…

人工智能 2023年7月6日
0074
聚类算法之层次聚类

层次聚类 1. 基本介绍层次聚类有聚合（自下而上）和分裂（自上而下）两种方式。聚合聚类开始将每个样本各自分到个类:之后将相距最近的两类合井，建立一个新的类，重复此操作直到满足…

人工智能 2023年6月15日
00120
MobileNetV3详解

目录 1、回忆v1，v2 2、V3的网络特点 3、详细解释 1、回忆v1，v2 V1：引入了深度可分离网络（DW+PW） V2：引入了倒残差结构（bottleneck） 2、V3的…

人工智能 2023年7月10日
0057
神经网络深度学习（三）优化器

目录一、优化器分类二、优化器详解三、优化器常见面试题一、优化器分类基本梯度下降法：包括标准梯度下降法(GD, Gradient Descent)，随机梯度下降法(SGD,…

人工智能 2023年6月16日
0096
多个一维列表（数组）存入csv文件或excel文件

提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档一维列表（数组）和二维列表（数组）存入csv/excel/txt文件的方法，其实大同小异。只需要稍微修改一下就可以…

人工智能 2023年7月7日
0045
基于Python的Covid-19全球疫情数据分析预测文档+项目源码及数据

资源下载地址：https://download.csdn.net/download/sheziqiong/85638801资源下载地址：https://download.csdn….

人工智能 2023年6月11日
0075
深度伪造（Deepfake）原理，生成和检测

深度伪造（Deepfake）原理，生成和检测一. 前沿二. Deepfake背景 * 2.1 视频伪造 2.2 自动编码器 2.3 生成对抗网络三. Deepfake生成四…

人工智能 2023年7月26日
0081
数字图像处理—-第二章图像处理知识点全面整理

文章目录 * – 2 数字图像处理基础 – + 2.1 视觉感知要素 + 2.1.1 人眼结构 + 2.1.2 眼睛中图像的形成 + 2.1.3亮度适应和鉴…

人工智能 2023年6月20日
0042
【Complex-YOLO: 点云实时目标检测】

Complex-YOLO: 点云实时目标检测前言要点分析具体算法分析 * 点云转化鸟瞰图提取特征 B- Box损失回归前言 Complex-YOLO，论文中介绍是一种仅在…

人工智能 2023年7月9日
00114
FIR数字滤波器在MATLAB中的实现

1 背景在线性系统中，信号滤波过程一般定义为，当输入波形通过一个系统时，对它作一个线性运算，在时间域上这种变换如像内插，外插微分和积分，在频率域上这种变换则如低通滤波或平滑，带通…

人工智能 2023年6月15日
0083

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30