【语音识别】WeNet：面向工业落地的E2E语音识别工具

2023年5月27日下午5:19 • 人工智能 • 阅读 178

WeNet：面向工业落地的E2E语音识别工具

文章目录

WeNet：面向工业落地的E2E语音识别工具
*
一、WeNet语音识别平台搭建
–
- 1、参考资料
- 2、快速搭建WeNet平台
二、WeNet实现推理（暂时无法使用onnx cpu版本进行推理）
–

一、WeNet语音识别平台搭建

1、参考资料

2、快速搭建WeNet平台

参考 WeNet中文文档

下载官方提供的预训练模型，并启动 docker 服务，加载模型，提供 websocket 协议的语音识别服务。


wget https://wenet-1256283475.cos.ap-shanghai.myqcloud.com/models/aishell2/20210618_u2pp_conformer_libtorch.tar.gz
tar -xf 20210618_u2pp_conformer_libtorch.tar.gz
model_dir=$PWD/20210618_u2pp_conformer_libtorch
docker run --rm -it -p 10086:10086 -v $model_dir:/home/wenet/model wenetorg/wenet-mini:latest bash /home/run.sh

Note：

这里的 $PWD = "/home/wenet/model"。
一定要保证 预训练模型文件的存储位置要正确，即解压在 $PWD 下，执行如下命令 model_dir=$PWD/20210618_u2pp_conformer_libtorch进行变量赋值，否则会报：

实时识别

使用浏览器打开文件 index.html ，在 WebSocket URL 中填入 ws://127.0.0.1:10086 (若在windows下通过wsl2运行docker, 则使用 ws://localhost:10086) , 允许浏览器弹出的请求使用麦克风，即可通过麦克风进行实时语音识别。

这里使用wsl2下的docker进行演示：如果靠近麦克风，误检率比较低。

; 二、WeNet实现推理（暂时无法使用onnx cpu版本进行推理）

Note：

如果仅使用wenet/bin/recognize.py，使用libTorch模型进行推理， 可以在windows中搭建环境，具体搭建过程参考WeNet官网
如果要使用wenet/bin/recognize_onnx.py进行推理， 需要先下载ctc_encoder，这里要注意pypi上的ctc_encoder只有2020的版本（WeNet1.0），和当前的WeNet3.0版本不一致，因此需要到https://github.com/Slyne/ctc_decoder下载并编译。由于编译 swig_encoder过程中需要用到 bash命令，所以尝试在 linux系统中运行，这里使用WSL + ubuntu作为解决方案。

; 1、搭建WeNet环境

这里由于要尝试使用onnx推理模型，因此使用WSL + ubuntu作为解决方案

WSL + Docker Desktop 的使用教程参考 WSL Ubuntu + Docker Desktop搭建python环境

在完成好WSL和Docker Desktop安装之后，WeNet环境配置步骤如下：

实例化anaconda容器

docker run -it --name="anaconda" -p 8888:8888 continuumio/anaconda3 /bin/bash

如果退出了，可以重启anaconda容器


docker start anaconda
docker exec -it anaconda /bin/bash

在 base环境下配置wenet环境（ 不要创建虚拟环境，方便之后打包成镜像，供pycharm使用）将WSL中的wenet项目拷贝到docker容器中（假设在WSL的 /home/usr下有wenet项目）

docker cp /home/usr/wenet/requirements.txt 9cf7b3c196f3:/home/

进入anaconda容器内，在 /home/使用pip安装所有包（conda源修改参考 ubuntu更换conda源）

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
conda install pytorch=1.10.0 torchvision torchaudio=0.10.0 cudatoolkit=11.1 -c pytorch -c conda-forge

下载 ctc_encoder项目（让conformer进行语音识别时能够使用beam_search方法） ctc_encoder官网如下：https://github.com/Slyne/ctc_decoder. 由于 github clone在ubuntu中可能不好使，所以在windows中进入 swig/setup.sh：

#!/usr/bin/env bash

if [ ! -d kenlm ]; then
    git clone https://github.com/kpu/kenlm.git
    echo -e "\n"
fi

if [ ! -d openfst-1.6.3 ]; then
    echo "Download and extract openfst ..."
    wget http://www.openfst.org/twiki/pub/FST/FstDownload/openfst-1.6.3.tar.gz --no-check-certificate
    tar -xzvf openfst-1.6.3.tar.gz
    echo -e "\n"
fi

if [ ! -d ThreadPool ]; then
    git clone https://github.com/progschj/ThreadPool.git
    echo -e "\n"
fi

echo "Install decoders ..."

python3 setup.py install --user --num_processes 10

4. 编译ctc_encoder：假设现在在anaconda容器中，ctc_encoder项目在 /home目录下，进入swig文件夹后，运行 bash setup.sh即可完成编译（需要先 apt install gcc, apt install g++）
5. 配置onnx，onnxruntime环境

pip install onnx==1.9.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install onnxruntime==1.9.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

将 Docker运行时容器打包成镜像 将anaconda容器运行时的环境打包成镜像，给pycharm专业版调用，参考Pycharm使用docker容器内的环境开发


docker commit -a "wangxiaoxi" -m "wenet_env" 2b1ad7022d19  wenet_env:v1

2、模型训练

参考Tutorial on AIShell

3、基于libTorch模型的推理

下载 aishell2 sample数据集进行wenet模型的推理，官网如下：希尔贝壳

下载WeNet的预训练模型（下载Checkpoint model – conformer）

将测试数据集和预训练模型放在项目路径下，例如：

[En]

Put the test data set and pre-training model under the project path, such as:

修改 train.yaml中的 cmvn_file的位置（如果使用docker容器中的python环境，建议使用相对路径）

cmvn_file: ../../test/aishell2/global_cmvn

将aishell2数据集修改成wenet数据格式

{"key": "D4_753", "wav": "../../test/aishell2/test_data/D4_750.wav", "txt": ""}
{"key": "D4_754", "wav": "../../test/aishell2/test_data/D4_751.wav", "txt": ""}
{"key": "D4_755", "wav": "../../test/aishell2/test_data/D4_752.wav", "txt": ""}
{"key": "D4_753", "wav": "../../test/aishell2/test_data/D4_753.wav", "txt": ""}
{"key": "D4_754", "wav": "../../test/aishell2/test_data/D4_754.wav", "txt": ""}
{"key": "D4_755", "wav": "../../test/aishell2/test_data/D4_755.wav", "txt": ""}
{"key": "D4_756", "wav": "../../test/aishell2/test_data/D4_756.wav", "txt": ""}

使用wenet/bin/recognize.py，输入如下命令

python recognize
--config=../../test/aishell2/train.yaml \
--dict=../../test/aishell2/units.txt \
--checkpoint=../../test/aishell2/final.pt \
--result_file=../../test/aishell2/att_res_result.txt \
--test_data=../../test/aishell2/test_data/data.list \

输出结果如下：

Namespace(batch_size=16, beam_size=10, bpe_model=None, checkpoint='../../test/aishell2/final.pt', config='../../test/aishell2/train.yaml', connect_symbol='', ctc_weight=0.0, data_type='raw', decoding_chunk_size=-1, dict='../../test/aishell2/units.txt', gpu=-1, mode='attention', non_lang_syms=None, num_decoding_left_chunks=-1, override_config=[], penalty=0.0, result_file='../../test/aishell2/att_res_result.txt', reverse_weight=0.0, simulate_streaming=False, test_data='../../test/aishell2/test_data/data1.list')
2022-07-04 15:54:22,441 INFO Checkpoint: loading from checkpoint ../../test/aishell2/final.pt for CPU
F:\ASR\wenet\wenet\transformer\asr_model.py:266: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').

  best_hyps_index = best_k_index // beam_size
2022-07-04 15:54:27,189 INFO D4_753 中国人民保险集团股份有限公司闽宁营销服务部
2022-07-04 15:54:27,189 INFO D4_755 中国电信闽宁镇合作营业厅
2022-07-04 15:54:27,189 INFO D4_754 闽宁镇卫生院
2022-07-04 15:54:27,189 INFO D4_756 闽宁镇客运站
2022-07-04 15:54:27,189 INFO D4_753 第六十一集
2022-07-04 15:54:27,189 INFO D4_755 第六十三集
2022-07-04 15:54:27,189 INFO D4_754 第六十二集

4、WeNet导出onnx模型

参考 ONNX backend on WeNet

这里先下载WeNet的预训练模型（下载Checkpoint model – conformer），接着使用 wenet/bin/export_onnx_cpu.py，设置如下参数，即可将libtorch的pt文件转换成onnx文件

python export_onnx_cpu.py
 --config F:/ASR/model/20210618_u2pp_conformer_libtorch_aishell2/train.yaml \
 --checkpoint F:/ASR/model/20210618_u2pp_conformer_libtorch_aishell2/final.pt \
 --chunk_size 16 \
 --output_dir F:/ASR/model/20210618_u2pp_conformer_libtorch_aishell2/onnx_dir \
 --num_decoding_left_chunks -1

如果onnx导出成功，会在输出文件夹中生成如下3个文件： encoder.onnx，ctc.onnx, decoder.onnx。

5、使用 `recognize_onnx` 进行推理（未解决）

参考 https://github.com/wenet-e2e/wenet/pull/761.

先下载conformer模型的权重文件（checkpoint model），https://wenet.org.cn/wenet/pretrained_models.html

解压权重文件后，文件夹目录如下

[En]

After extracting the weight file, the folder directory is as follows

修改 train.yaml中的 cmvn_file的位置


cmvn_file: ../../test/aishell2/global_cmvn

转换成wenet的json数据格式：假设现在有音频文件 D4_750.wav，通过格式转换成如下json格式，参考https://wenet.org.cn/wenet/tutorial_librispeech.html?highlight=test_data#stage-0-prepare-training-data

{"key": "D4_753", "wav": "../../test/aishell2/test_data/D4_750.wav", "txt": "而对楼市成交抑制作用最大的限购"}

接着运行：

python3 wenet/bin/recognize_onnx.py --config=20210618_u2pp_conformer_exp/train.yaml --test_data=raw_wav/test/data.list --gpu=0 --dict=20210618_u2pp_conformer_exp/words.txt --mode=attention_rescoring --reverse_weight=0.4 --ctc_weight=0.1 --result_file=./att_res_result.txt --encoder_onnx=onnx_model/encoder.onnx --decoder_onnx=onnx_model/decoder.onnx

注意 这里最好使用相对路径，因为使用的是docker里的python环境，如果在读取文件时使用windows下的绝对路径，会导致如下错误。解决思路参考https://github.com/microsoft/onnxruntime/issues/8735（反正我解决不了）

{FileNotFoundError}[Errno 2] No such file or directory: 'F:/ASR/model/20210618_u2pp_conformer_libtorch_aishell2/train.yaml'

这里使用 export_onnx_cpu导出的onnx模型，使用 recognize_onnx进行推理

encoder_ort_session=onnxruntime.InferenceSession(encoder_outpath, providers=['CPUExecutionProvider']);
ort_inputs = {
    encoder_ort_session.get_inputs()[0].name: feats.astype('float32'),
    encoder_ort_session.get_inputs()[1].name: feats_lengths.astype('int64'),
    encoder_ort_session.get_inputs()[2].name: np.zeros((12,4,0,128)).astype('float32'),
    encoder_ort_session.get_inputs()[3].name: np.zeros((12,1,256,7)).astype('float32')
}
encoder_ort_session.run(None, ort_inputs)

会抛出错误

{Fail}[ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Slice node. Name:'Slice_49' Status Message: slice.cc:153 FillVectorsFromInput Starts must be a 1-D array

应该是cuda和onnxruntime版本不一致导致的，参考 OnnxRunTime遇到FAIL : Non-zero status code returned while running BatchNormalization node.

后来发现 recognize_onnx是对 export_onnx_gpu.py导出的模型进行推理，而不是 export_onnx_cpu.py。要使用 export_onnx_gpu.py还得安装 nividia_docker和 onnxruntime_gpu，否则会报错：

/opt/conda/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:53: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'CPUExecutionProvider'
  warnings.warn("Specified provider '{}' is not in available provider names."
Traceback (most recent call last):
  File "/opt/project/wenet/bin/export_onnx_gpu.py", line 574, in <module>
    onnx_config = export_enc_func(model, configs, args, logger, encoder_onnx_path)
  File "/opt/project/wenet/bin/export_onnx_gpu.py", line 334, in export_offline_encoder
    test(to_numpy([o0, o1, o2, o3, o4]), ort_outs)
NameError: name 'test' is not defined

这里就不费这个力了，等wenet项目完善吧。

Original: https://blog.csdn.net/qq_33934427/article/details/125602022
Author: 王小希ww
Title: 【语音识别】WeNet：面向工业落地的E2E语音识别工具

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/527049/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

CNN应用：图片分类

*学习的最大理由是想摆脱平庸，早一天就多一份人生的精彩；迟一天就多一天平庸的困扰。 CNN由许多神经网络层组成。卷积和池化这两种不同类型的层通常是交替的。网络中每个滤波器的深度从左…

人工智能 2023年6月30日
0063
分类模型的ROC曲线、AUC值、GINI系数、Lift、Gain、KS指标分别是什么？计算公式是什么？有什么意义？

抵扣说明： 1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。2.余额无法直接购买下载，可以购买VIP、C币套餐、付费专栏及课程。 Original: https:…

人工智能 2023年7月3日
0064
[人工智能-深度学习-67]：目标检测 – 常见目标检测算法大汇总

### 回答1：算法_7 –4：深度优先搜索 1. 从起点开始，将其标记为已访问。 2. 对于起点的每个未访问的邻居，递归地执行步骤1 –2。 3. 重复…

人工智能 2023年7月9日
0053
真香！Vision Transformer 快速实现 Mnist 识别

作者 | 李秋键出品 | AI科技大本营（ID:rgznai100）引言：基于深度学习的方法在计算机视觉领域中最典型的应用就是卷积神经网络CNN。CNN中的数据表示方式是分层的…

人工智能 2023年7月14日
0048
用c++ PCL库和Opencv库实现将点云数据转化成图片形式

用c++ PCL库和Opencv库实现将点云数据转化成图片形式自己的毕业设计与点云相关，现在想要将得到的点云切割数据转化为图片形式，然后进行opencv的图像处理，但是苦于找了很…

人工智能 2023年7月18日
0039
编辑距离与字符错误率CER

在语音识别场景中，字符错误率（Character Error Rate，CER）是衡量语音识别效果的一个重要指标。下文将介绍CER的原理，并且给出python实现的代码。说到CE…

人工智能 2023年5月23日
0074
[总结] 半监督学习方法: 一致性正则化(Consistency Regularization)

许多机器学习方法中, 尤其是深度学习中的神经网络, 都存在几个问题: 模型容易过拟合. 模型在受到微小扰动(噪声)后, 预测结果会受相当程度的影响. 为了减少过拟合现象, 典型的监…

人工智能 2023年6月23日
0081
数学建模国赛题型和获奖策略

数学建模题目可以分为四类：1.评价类问题2.运筹优化类问题3.预测类问题4.机理分析类问题（人口模型/物理学/微分方程等）国赛中，优化类问题是一定会出。然后是评价类也是一定会出。…

人工智能 2023年6月16日
0082
【论文阅读】Learning with Hypergraphs: Clustering, Classification, and Embedding

目录 Abstract Introduction Preliminaries Normalized hypergraph cut Random walk explanation S…

人工智能 2023年5月31日
00103
Yolov5–从模块解析到网络结构修改（添加注意力机制）

文章目录 1.模块解析(common.py) * – 01. Focus模块 02. CONV模块 03.Bottleneck模块： 04.C3模块 05.SPP模块 …

人工智能 2023年7月26日
0065
【推荐实践】阿里文娱面向用户增长的信息流分发机制

今天给大家带来阿里文娱-人工智能部-信息流推荐的天师所做的分享《阿里文娱面向用户增长的信息流分发机制》,关注推荐算法、信息流分发、用户增长的伙伴们别错过啦！（到小程序：省时查报告 …

人工智能 2023年6月10日
0081
基于TensorFlow的花卉识别

概要设计数据分析本次设计的主题是花卉识别，数据为TensorFlow的官方数据集flower_photos，包括5种花卉（雏菊、蒲公英、玫瑰、向日葵和郁金香）的图片，并有对应类…

人工智能 2023年5月24日
0063
[BEV系列]BEVFormer: Learning Bird’s-Eye-ViewRepresentation from Multi-Camera Images viaSpatiotemporal

论文链接：https://arxiv.org/pdf/2203.17270v1.pdf代码链接：https://github.com/zhiqi-li/BEVFormer 摘要（A…

人工智能 2023年6月2日
0061
Pandas Shift函数（错位相加减）

Pandas Shift函数基础官方帮助文档如下： >>> import pandas >>> help(pandas.DataFrame.s…

人工智能 2023年7月7日
0058
TensorFlow GPU最完整的安装方法

自己这几天更换电脑，再加上前次旧电脑学习，安装了好几次TensorFlow，每次都遇到了一些问题，经常缺一些文件，在网上下载文件还很慢，走了不少弯路，特将完整的安装方法记录如下，以…

人工智能 2023年5月23日
0082
pandas 小计

数据库数据读取/存储： import pymysql from sqlalchemy import create_engine conn = create_engine(‘mysq…

人工智能 2023年7月6日
0049

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30