系统环境
ubuntu 20.04 lts, 系统安装的python3默认为python3.8,我在海外的linode云主机测试的。
安装环境:
sudo apt update
sudo apt-get install python3 cmake sox libsndfile1-dev ffmpeg flac -y
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python3 get-pip.py
git clone https://github.com/espnet/espnet
cd espnet/tools
./setup_python.sh $(command -v python3)
fix bug on matplotlib
sudo apt-get install libfreetype6-dev -y
pip install torch==1.7.1 chainer==6.0.0 kaldiio espnet
make TH_VERSION=1.7.1 CPU_ONLY=0
安装实用工具
apt install lrzsz pcp
import soundfile
from espnet_model_zoo.downloader import ModelDownloader
from espnet2.bin.asr_inference import Speech2Text
d = ModelDownloader()
speech2text = Speech2Text(
**d.download_and_unpack("kamo-naoyuki/aishell_conformer"),
# Decoding parameters are not included in the model file
maxlenratio=0.0,
minlenratio=0.0,
beam_size=20,
ctc_weight=0.3,
lm_weight=0.5,
penalty=0.0,
nbest=1
)
audio, rate = soundfile.read("t.wav")
nbests = speech2text(audio)
text, *_ = nbests[0]
print(text)
性能:推理解码速度挺快,但和cpu有关 ,推理时8线程cpu全部占满,5秒出结果。
功能效果:
t.wav测试文件中我的原话是:”我说一句话,你给我识别一下看看,看效果怎么样”
识别后的中文为:”我说一句话你不要看看他说我什么样”
Original: https://blog.csdn.net/huxuanlai/article/details/111646193
Author: huxuanlai
Title: 端到端语音识别的espnet在cpu上aishell预训练模型中文语音配置跑通
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/524920/
转载文章受原作者版权保护。转载请注明原作者出处!