Espnet ASR Demo & Quantization Document

  • This is a document of how to run Espnet (v1) ASR Demo and its model quantization
  • Test enviroment:


Note: Please follow the original installation guide provided by Espnet. Only some notes below should be paid attention to.


Install Kaldi

  • The Kaldi installation includes two parts: 1. tools installation 2. src installation. Make sure install them all in order
  • Once installed, many .o binary files can be found in directories such as: <kaldi-root>\{featbin,fgmmbin,fstbin,etc.}</kaldi-root>

Install Espnet

  • Kaldi should be linked into <espnet>/tools</espnet> (check guide)
  • Option A) Setup Anaconda environment is choosen in this document, so a virtual enviroment espnet is created with python==3.8
  • Since the current CUDA version is 11.6, which is not compatible with pytorch 1.10.1, so espnet should be installed by $ make TH_VERSION=1.10.1 CUDA_VERSION=11.3, which specifies the version pytorch and CUDA
  • Custom tools in [Optional] Custom tool installation are not installed
  • install chainer in the espnet conda enviroment by pip install chainer==6.0.0 (cupy is not installed due to some errors)

This demo is to decode (translate) .wav audio file into words

To quantize the model from FP32 to INT8

Espnet provides dynamic quantization method through pytorch API.

To enable dynamic quantization, add the following codes in espnet/utils/ file line 248-249

        --quantize-asr-model True \
        --quantize-dtype "qint8" \

Now we can perform decoding as described in the last section

Author: GLinttsd
