tensorRT踩坑日常之训练模型转ONNX转engine

2023年6月24日下午10:40 • 人工智能 • 阅读 649

tensorRT是用来干嘛的在这里就不多介绍了
在使用tensorRT提速之前需要先训练模型
在将训练好的模型转ONNX再转engine

一、将训练好的模型转ONNX这里就提供将torch转ONNX，其余的网上还是有很多教程的

import torch
import torch.nn
import onnx
model = torch.load('best.pt')
model.eval()
input_names = ['input']
output_names = ['output']
x = torch.randn(1,3,32,32,requires_grad=True)
torch.onnx.export(model, x, 'flame.onnx', input_names=input_names, output_names=output_names, verbose='True')

输出就行

二、将ONNX转engine
可以直接使用tensorrt自带的trtexec将onnx模型转engine：
进入tensorrt的安装目录下的bin文件，就能看到trtexec：输入
ubuntu下的trtexec

/usr/src/tensorrt/bin

trtexec -h 查看帮助命令

=== Model Options ===
  --uff=<file>                UFF model
  --onnx=<file>               ONNX model
  --model=<file>              Caffe model (default = no model, random weights used)
  --deploy=<file>             Caffe prototxt file
  --output=<name>[,<name>]*   Output names (it can be specified multiple times); at least one output is required for UFF and Caffe
  --uffInput=<name>,X,Y,Z     Input blob name and its dimensions (X,Y,Z=C,H,W), it can be specified multiple times; at least one is required for UFF models
  --uffNHWC                   Set if inputs are in the NHWC layout instead of NCHW (use X,Y,Z=H,W,C order in --uffInput)

=== Build Options ===
  --maxBatch                  Set max batch size and build an implicit batch engine (default = 1)
  --explicitBatch             Use explicit batch sizes when building the engine (default = implicit)
  --minShapes=spec            Build with dynamic shapes using a profile with the min shapes provided
  --optShapes=spec            Build with dynamic shapes using a profile with the opt shapes provided
  --maxShapes=spec            Build with dynamic shapes using a profile with the max shapes provided
  --minShapesCalib=spec       Calibrate with dynamic shapes using a profile with the min shapes provided
  --optShapesCalib=spec       Calibrate with dynamic shapes using a profile with the opt shapes provided
  --maxShapesCalib=spec       Calibrate with dynamic shapes using a profile with the max shapes provided
                              Note: All three of min, opt and max shapes must be supplied.

                                    However, if only opt shapes is supplied then it will be expanded so
                                    that min shapes and max shapes are set to the same values as opt shapes.

                                    In addition, use of dynamic shapes implies explicit batch.

                                    Input names can be wrapped with escaped single quotes (ex: \'Input:0\').

                              Example input shapes spec: input0:1x3x256x256,input1:1x3x128x128
                              Each input shape is supplied as a key-value pair where key is the input name and
                              value is the dimensions (including the batch dimension) to be used for that input.

                              Each key-value pair has the key and value separated using a colon (:).

                              Multiple input shapes can be provided via comma-separated key-value pairs.

  --inputIOFormats=spec       Type and format of each of the input tensors (default = all inputs in fp32:chw)
                              See --outputIOFormats help for the grammar of type and format list.

                              Note: If this option is specified, please set comma-separated types and formats for all
                                    inputs following the same order as network inputs ID (even if only one input
                                    needs specifying IO format) or set the type and format once for broadcasting.

  --outputIOFormats=spec      Type and format of each of the output tensors (default = all outputs in fp32:chw)
                              Note: If this option is specified, please set comma-separated types and formats for all
                                    outputs following the same order as network outputs ID (even if only one output
                                    needs specifying IO format) or set the type and format once for broadcasting.

                              IO Formats: spec  ::= IOfmt[","spec]
                                          IOfmt ::= type:fmt
                                          type  ::= "fp32"|"fp16"|"int32"|"int8"
                                          fmt   ::= ("chw"|"chw2"|"chw4"|"hwc8"|"chw16"|"chw32"|"dhwc8")["+"fmt]
  --workspace=N               Set workspace size in megabytes (default = 16)
  --noBuilderCache            Disable timing cache in builder (default is to enable timing cache)
  --nvtxMode=mode             Specify NVTX annotation verbosity. mode ::= default|verbose|none
  --minTiming=M               Set the minimum number of iterations used in kernel selection (default = 1)
  --avgTiming=M               Set the number of times averaged in each iteration for kernel selection (default = 8)
  --noTF32                    Disable tf32 precision (default is to enable tf32, in addition to fp32)
  --refit                     Mark the engine as refittable. This will allow the inspection of refittable layers
                              and weights within the engine.

  --fp16                      Enable fp16 precision, in addition to fp32 (default = disabled)
  --int8                      Enable int8 precision, in addition to fp32 (default = disabled)
  --best                      Enable all precisions to achieve the best performance (default = disabled)
  --calib=<file>              Read INT8 calibration cache file
  --safe                      Only test the functionality available in safety restricted flows
  --saveEngine=<file>         Save the serialized engine
  --loadEngine=<file>         Load a serialized engine
  --tacticSources=tactics     Specify the tactics to be used by adding (+) or removing (-) tactics from the default
                              tactic sources (default = all available tactics).

                              Note: Currently only cuBLAS and cuBLAS LT are listed as optional tactics.

                              Tactic Sources: tactics ::= [","tactic]
                                              tactic  ::= (+|-)lib
                                              lib     ::= "cublas"|"cublasLt"

=== Inference Options ===
  --batch=N                   Set batch size for implicit batch engines (default = 1)
  --shapes=spec               Set input shapes for dynamic shapes inference inputs.

                              Note: Use of dynamic shapes implies explicit batch.

                                    Input names can be wrapped with escaped single quotes (ex: \'Input:0\').

                              Example input shapes spec: input0:1x3x256x256, input1:1x3x128x128
                              Each input shape is supplied as a key-value pair where key is the input name and
                              value is the dimensions (including the batch dimension) to be used for that input.

                              Each key-value pair has the key and value separated using a colon (:).

                              Multiple input shapes can be provided via comma-separated key-value pairs.

  --loadInputs=spec           Load input values from files (default = generate random inputs). Input names can be wrapped with single quotes (ex: 'Input:0')
                              Input values spec ::= Ival[","spec]
                                           Ival ::= name":"file
  --iterations=N              Run at least N inference iterations (default = 10)
  --warmUp=N                  Run for N milliseconds to warmup before measuring performance (default = 200)
  --duration=N                Run performance measurements for at least N seconds wallclock time (default = 3)
  --sleepTime=N               Delay inference start with a gap of N milliseconds between launch and compute (default = 0)
  --streams=N                 Instantiate N engines to use concurrently (default = 1)
  --exposeDMA                 Serialize DMA transfers to and from device. (default = disabled)
  --noDataTransfers           Do not transfer data to and from the device during inference. (default = disabled)
  --useSpinWait               Actively synchronize on GPU events. This option may decrease synchronization time but increase CPU usage and power (default = disabled)
  --threads                   Enable multithreading to drive engines with independent threads (default = disabled)
  --useCudaGraph              Use cuda graph to capture engine execution and then launch inference (default = disabled)
  --separateProfileRun        Do not attach the profiler in the benchmark run; if profiling is enabled, a second profile run will be executed (default = disabled)
  --buildOnly                 Skip inference perf measurement (default = disabled)

=== Build and Inference Batch Options ===
                              When using implicit batch, the max batch size of the engine, if not given,
                              is set to the inference batch size;
                              when using explicit batch, if shapes are specified only for inference, they
                              will be used also as min/opt/max in the build profile; if shapes are
                              specified only for the build, the opt shapes will be used also for inference;
                              if both are specified, they must be compatible; and if explicit batch is
                              enabled but neither is specified, the model must provide complete static
                              dimensions, including batch size, for all inputs

=== Reporting Options ===
  --verbose                   Use verbose logging (default = false)
  --avgRuns=N                 Report performance measurements averaged over N consecutive iterations (default = 10)
  --percentile=P              Report performance for the P percentage (0P100, 0 representing max perf, and 100 representing min perf; (default = 99%)
  --dumpRefit                 Print the refittable layers and weights from a refittable engine
  --dumpOutput                Print the output tensor(s) of the last inference iteration (default = disabled)
  --dumpProfile               Print profile information per layer (default = disabled)
  --exportTimes=<file>        Write the timing results in a json file (default = disabled)
  --exportOutput=<file>       Write the output tensors to a json file (default = disabled)
  --exportProfile=<file>      Write the profile information per layer in a json file (default = disabled)

=== System Options ===
  --device=N                  Select cuda device N (default = 0)
  --useDLACore=N              Select DLA core N for layers that support DLA (default = none)
  --allowGPUFallback          When DLA is enabled, allow GPU fallback for unsupported layers (default = disabled)
  --plugins                   Plugin library (.so) to load (can be specified multiple times)

运行就行

trtexec –onnx=flame_sim.onnx –saveEngine=flame_sim.engine –best –workspace=1024 –minShapes=inputx:1x3x224x224 –optShapes=inputx:1x3x2224x224 –maxShapes=inputx:1x3x224x224

但是要记住设置工作区间默认的工作区间为16，单位为MB –minShapes=inputx:1x3x224x224 –optShapes=inputx:1x3x2224x224 –maxShapes=inputx:1x3x224x22
设置最小，最大和最佳的输入注意保存engine的时候不要保存在bin下面可能会报错的保存引擎错误

有的时候转engine的时候回报错

onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64
weights, while TensorRT does not natively support INT64. Attempting to
cast down to INT32.

是因为你的onnx是INT64权重生成的，而tensorrt是支持INT32 的所有要将onnx转为更简单的模型。需要用到 onnx-simplifier 使用 pip install onnx-simplifier就能直接安装了
安装完毕后就可以转了 python -m onnxsim .\flame.onnx .\flame_sim.onnx

Original: https://blog.csdn.net/chaocainiao/article/details/124197430
Author: 静待有缘人
Title: tensorRT踩坑日常之训练模型转ONNX转engine

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/649889/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

conda 环境

准备工作：配置 conda 镜像默认的 conda 源下载速度比较慢，需要先将 conda 源设置为国内镜像 添加 Anaconda …

人工智能 2023年7月18日
0041
基于python的数字印刷体识别_不告诉你我用了它配合Python简简单单开发OCR识别，带你识别手写体、印刷体、身份证等N种，附代码！…

1、你心目中的OCR2、成果展现(一)手写体成果展现(一)印刷体成果展现(三)名片识别成果展现(四)身份证(一样是模板)成果展现(五)表格识别成果展现：(六)整题识别成果展现：3、…

人工智能 2023年5月25日
0076
卷积神经网络实现CIFAR100数据集分类

文章目录 CIFAR100数据集介绍代码实现 * 读取数据集构建网络模型网络模型编译模型训练模型评估模型运行模型运行结果总结 CIFAR100数据集介绍 CIFAR…

人工智能 2023年5月26日
0073
Softmax 回归原理与实现

Softmax回归（Softmax Regression），也称为多项（Multinomial）或多类（Multi-Class）的Logistic 回归，是 Logistic回归在…

人工智能 2023年6月18日
0087
[科普文] 搞 Web3 要学习哪些基础知识？

[作者按] Solv 研究组的系列文章《Web3 国际市场危机分析》已经发表了三篇。这一系列的文章，主要是从美元稳定币的创造、流动和配置的视角来分析本轮 Web3 国际市场危机的…

人工智能 2023年7月28日
0041
Python、OpenCV实现的电脑远程拍照控制系统，照片并以web形式发布

一、题目：利用OpenCV等，自行Python编程实现一个远程拍照控制系统，该系统包括摄像头端（Server）和用户端(Client）。Server端运行.py程序，接受Clie…

人工智能 2023年7月20日
0063
python中dataframe 判断是否存在_python – 如何检查列中是否存在Pandas

有一种方法来检查一个Pandas DataFrame中是否存在列？假设我有以下DataFrame： import pandas as pd from random import …

人工智能 2023年7月7日
00124
Python-Pandas-Excel/CSV 数据处理大全整理（学会了就更新）

import pandas as pd # 导入pandas数据库 import numpy as np # 导入numpy数据库 1. 读取CSV文件： 路…

人工智能 2023年7月7日
0066
统计分析：聚类分析（详细讲解）

聚类分析是研究”物以类聚”的一种方法。人类认识世界往往首先将被认识的对象进行分类，早起人们主要靠经验和专业知识实现分类，但随着生产技术和社会科学的发展，对…

人工智能 2023年5月31日
0082
基于改进遗传算法的无人机搜索路径规划的研究

基于改进遗传算法的无人机搜索路径规划的研究人工智能技术与咨询作者刘江阳等关键词: 遗传算法；无人机搜索；路径规划；Genetic Algorithms；UAV Search；…

人工智能 2023年6月23日
0054
numpy .shape保姆级简介；函数返回值详解；运用python内置函数help()

首先 python 内置 help() 函数可查看函数或模块用途的详细说明。以下是我调用该函数后，对返回值一些翻译。 import numpy as np help(np.sha…

人工智能 2023年7月16日
0074
算法面试之RNN激活函数、权重共享

概述模拟人的阅读顺序N-Gram模型：认为一个词只和前面N-1个词有关循环神经网络RNN理论上可以往前(后)看任意多个词梯度消失或者梯度爆炸会导致梯度为0或NaN，进而无法继续训…

人工智能 2023年5月30日
0079
如何优雅的处理编码与文本之间转换工作？

惯例还是说下背景，对于系统中的表单某些属性，例如，性别、合同类型、供应商，展示给用户看的，需要是文本内容，同时，系统往往需要根据这些属性进行逻辑功能的处理，如不同合同类型进行不同的…

人工智能 2023年6月30日
0058
深度强化学习-DQN算法原理与代码

DQN算法是DeepMind团队提出的一种深度强化学习算法，在许多电动游戏中达到人类玩家甚至超越人类玩家的水准，本文就带领大家了解一下这个算法，论文和代码的链接见下方。论文：Hu…

人工智能 2023年6月16日
0080
模型之T5，UniLM，MASS，GPT

公司项目上有个文本生成的任务，难度比较大，花了相对不短的时间去熟悉这些模型，当然也没花太久，大概也就是读了下论文，以及网友们的一些介绍，现在记录总结下，后续应该会去阅读以及改写相关…

人工智能 2023年5月28日
0079
『NLP学习笔记』向量与矩阵、矩阵与矩阵的余弦相似度

抵扣说明： 1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。2.余额无法直接购买下载，可以购买VIP、C币套餐、付费专栏及课程。 Original: https:…

人工智能 2023年5月28日
0090

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

tensorRT踩坑日常之训练模型转ONNX转engine

大家都在看