kaldi解码中, 排查静音对应到“啊“的异常问题

label: 对二幺五嗯好好谢谢嗯
asr pred: Demo 啊 对 哎呀 捂 好 谢谢
phone sequence: Demo a1_S sil d_B w E4 Y_E A1_B Y y A1_E w_B u3_E sil h_B a3 W_E x_B y E4 x y E4_E sil

优先排查AM模型

● 抽取音频特征, 并apply cmvn

echo "Demo $wav_file" | compute-fbank-feats --num-mel-bins=40 --sample-frequency=8000 scp:- ark,t:- | apply-cmvn-online asr_model_online/online_model/20211212/global_cmvn.stats ark,t:- ark,t:- > feat.txt

● nnet3-latgen-faster /nnet3-latgen-faster-batch来对音频解码, 生成文本识别结果的同时, 生成lattice, 并用show_lattice.sh 来将lattice可视化:

nnet3-latgen-faster --frames-per-chunk=50 --extra-left-context=10 --extra-right-context=0 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=0.1 --allow-partial=true --word-symbol-table=graph/graph_zjtb//words.txt exp/exp_zjtb/tdnn_0730/final.mdl graph/graph_zjtb/HCLG.fst ark,t:feat.txt "ark:test.lats"

./utils/show_lattice.sh --mode save Demo asr/lattice_test/test.lats.gz asr/lattice_test/words.txt

● nnet3-compute输出帧级别的预测结果, 即每一帧对应的所有pdf的概率列表, 其中, pdf是0-based indexing, 可将整个矩阵用excel打开, 用excel根据值渐变色功能, 可看出, 哪个概率pdf的概率最大, 即这帧预测结果为此pdf


nnet3-compute --apply-exp=true exp/exp_zjtb/tdnn_0730/final.mdl "ark,t:feat.txt" ark,t:- > test_frame_res.txt
show-transitions graph/graph_zjtb/phones.txt exp/exp_zjtb/tdnn_0730/final.mdl > test_occs.txt

● 通过show-transitions以human-readable格式打印出HMM转移模型, 如下:

show-transitions graph/graph_zjtb/phones.txt exp/exp_zjtb/tdnn_0730/final.mdl > test_occs.txt

输出结果, 其中pdf的值为pdf id, 即

Transition-state 1744: phone = d_B hmm-state = 2 pdf = 3184
Transition-id = 3487 p = 0.01 [self-loop]
Transition-id = 3488 p = 0.99 [2 -> 3]
Transition-state 1745: phone = d_B hmm-state = 2 pdf = 3351
Transition-id = 3489 p = 0.311331 [self-loop]
Transition-id = 3490 p = 0.688669 [2 -> 3]
Transition-state 1746: phone = e1 hmm-state = 0 pdf = 71
Transition-id = 3491 p = 0.01 [self-loop]
Transition-id = 3492 p = 0.990001 [0 -> 1]
Transition-state 1747: phone = e1 hmm-state = 0 pdf = 1136
Transition-id = 3493 p = 0.01 [self-loop]
Transition-id = 3494 p = 0.99 [0 -> 1]
Transition-state 1748: phone = e1 hmm-state = 0 pdf = 1675
Transition-id = 3495 p = 0.0854856 [self-loop]
Transition-id = 3496 p = 0.914514 [0 -> 1]

排查LM中, 不同句子的scoring情况:

● srilm的ngram language model相关介绍: https://web.stanford.edu/~jurafsky/slp3/3.pdf
● probabiltiy越高, perplexcity越低.

ngram -lm zhijian_lm.gz -ppl test.txt -debug 2

test.txt内容如下:

啊 对 哎呀 捂 好 谢谢
对 哎呀 捂 好 谢谢

输出结果

reading 890937 1-grams
reading 14733024 2-grams
reading 13701101 3-grams
啊 对 哎呀 捂 好 谢谢
p( 啊 | ) = [2gram] 0.1468152 [ -0.8332291 ]
p( 对 | 啊 …) = [3gram] 0.06473259 [ -1.188877 ]
p( 哎呀 | 对 …) = [3gram] 0.0001231799 [ -3.90946 ]
p( 捂 | 哎呀 …) = [2gram] 3.734564e-05 [ -4.42776 ]
p( 好 | 捂 …) = [2gram] 0.04002296 [ -1.397691 ]
p( 谢谢 | 好 …) = [2gram] 0.04092193 [ -1.388044 ]
p( | 谢谢 …) = [3gram] 0.1421139 [ -0.8473634 ]
1 sentences, 6 words, 0 OOVs
0 zeroprobs, logprob= -13.99242 ppl= 99.75112 ppl1= 214.818

对 哎呀 捂 好 谢谢
p( 对 | ) = [2gram] 0.02417497 [ -1.616634 ]
p( 哎呀 | 对 …) = [3gram] 0.0001068959 [ -3.971039 ]
p( 捂 | 哎呀 …) = [2gram] 3.734564e-05 [ -4.42776 ]
p( 好 | 捂 …) = [2gram] 0.04002296 [ -1.397691 ]
p( 谢谢 | 好 …) = [2gram] 0.04092193 [ -1.388044 ]
p( | 谢谢 …) = [3gram] 0.1421139 [ -0.8473634 ]
1 sentences, 5 words, 0 OOVs
0 zeroprobs, logprob= -13.64853 ppl= 188.2588 ppl1= 536.6687

file test.txt: 2 sentences, 11 words, 0 OOVs
0 zeroprobs, logprob= -27.64096 ppl= 133.7295 ppl1= 325.6973

由LM引入的多余的”啊”, 因为lm ngram模型中, 确实出现了超大量的啊开头的句子. 开始重点优化语言模型, 方法:
● 计算出开头字的出现频率, 其中, 若以叹词开头的(啊, 嗯, 哦, 呃, 我等), 逐一选出, 按30%保留, 剩余的去除 开头字.

● 清理相同字母连续多次出现的无意义的英文词.

● 进行多轮WER评估, 看效果.

Original: https://blog.csdn.net/weixin_40103562/article/details/125545279
Author: phoenix-bai
Title: kaldi解码中, 排查静音对应到“啊“的异常问题

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/513028/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球