webrtc 降噪算法总结

维纳增益

webrtc ns采用的是维纳增益,参考之前写的维纳滤波原理,可得增益函数为,
G ( k , l ) = ∣ X ( k , l ) ∣ 2 ∣ Y ( k , l ) ∣ 2 = 1 − ∣ N ( k , l ) ∣ 2 ∣ Y ( k , l ) ∣ 2 (1) G(k,l)=\frac{|X(k,l)|^2}{|Y(k,l)|^2}=1-\frac{|N(k,l)|^2}{|Y(k,l)|^2} \tag{1}G (k ,l )=∣Y (k ,l )∣2 ∣X (k ,l )∣2 ​=1 −∣Y (k ,l )∣2 ∣N (k ,l )∣2 ​(1 )

其中∣ Y ( k , l ) ∣ |Y(k,l)|∣Y (k ,l )∣代表带噪语音的频谱幅度,∣ X ( k , l ) ∣ |X(k,l)|∣X (k ,l )∣代表干净语音的频谱幅度,∣ N ( k , l ) ∣ |N(k,l)|∣N (k ,l )∣代表噪声频谱幅度,k k k和l l l分别代表频率和时间索引,要想得到增益函数的估计,核心就是准确的估计出噪声的频谱能量。

估计噪声

噪声估计使用时间递归平滑和语音存在概率加权的方法,即,

[En]

Noise estimation uses the method of time recursive smoothing and weighted by the existence probability of speech, that is,

λ d ^ ( k , l ) = γ λ d ^ ( k , l ) + ( 1 − γ ) ( ( 1 − p ( k , l ) ) Y ( k , l ) + p ( k , l ) λ d ^ ( k , l − 1 ) ) (2) \hat{\lambda_d}(k,l)=\gamma\hat{\lambda_d}(k,l) + (1-\gamma)((1 – p(k,l))Y(k,l) + p(k,l)\hat{\lambda_d}(k,l-1)) \tag{2}λd ​^​(k ,l )=γλd ​^​(k ,l )+(1 −γ)((1 −p (k ,l ))Y (k ,l )+p (k ,l )λd ​^​(k ,l −1 ))(2 )

γ \gamma γ为平滑因子,p ( k , l ) p(k,l)p (k ,l )是基于{ Y , F } {Y,F}{Y ,F }估计的条件语音存在概率,F F F代表语音特征。

估计基于 { Y , F } {Y,F}{Y ,F } 的条件语音存在概率

每一帧可以分为两种情况:有语音和无语音,即,

[En]

Each frame can be divided into two situations: the presence of speech and the absence of speech, that is,

H 0 ( k , l ) : Y ( k , l ) = D ( k , l ) H 1 ( k , l ) : Y ( k , l ) = X ( k , l ) + D ( k , l ) (3) \begin{aligned} H_0(k,l):Y(k,l) &= D(k,l) \ H_1(k,l):Y(k,l)&=X(k,l) + D(k,l) \end{aligned} \tag{3}H 0 ​(k ,l ):Y (k ,l )H 1 ​(k ,l ):Y (k ,l )​=D (k ,l )=X (k ,l )+D (k ,l )​(3 )

根据贝叶斯准则求解{ Y , { F } } {Y,{F}}{Y ,{F }}条件语音存在概率,
p ( H 1 ( k , l ) ∣ Y ( k , l ) , { F } ) = p ( Y ( k , l ) ∣ H 1 ( k , l ) , { F } ) p ( H 1 ( k , l ) ∣ { F } ) p ( { F } ) p ( Y ( k , l ) ∣ H 1 ( k , l ) , { F } ) p ( H 1 ( k , l ) ∣ { F } ) p ( { F } ) + p ( Y ( k , l ) ∣ H 0 ( k , l ) , { F } ) p ( H 0 ( k , l ) ∣ { F } ) p ( { F } ) = p ( Y ( k , l ) ∣ H 1 ( k , l ) , { F } ) p ( H 1 ( k , l ) ∣ { F } ) p ( Y ( k , l ) ∣ H 1 ( k , l ) , { F } ) p ( H 1 ( k , l ) ∣ { F } ) + p ( Y ( k , l ) ∣ H 0 ( k , l ) , { F } ) p ( H 0 ( k , l ) ∣ { F } ) = p ( Y ( k , l ) ∣ H 1 ( k , l ) , { F } ) p p ( Y ( k , l ) ∣ H 1 ( k , l ) , { F } ) p + p ( Y ( k , l ) ∣ H 0 ( k , l ) , { F } ) ( 1 − p ) = p Δ ( k , l ) p Δ ( k , l ) + ( 1 − p ) = 1 1 + ( 1 − p ) / ( p Δ ( k , l ) ) (4) \begin{aligned} p(H_1(k,l)|Y(k,l),{F}) &= \frac{p(Y(k,l)| H_1(k,l),{F}) p(H_1(k,l)|{F})p({F})}{p(Y(k,l)| H_1(k,l),{F}) p(H_1(k,l)|{F})p({F}) + p(Y(k,l)| H_0(k,l),{F}) p(H_0(k,l)|{F})p({F})} \ &= \frac{p(Y(k,l)| H_1(k,l),{F}) p(H_1(k,l)|{F})}{p(Y(k,l)| H_1(k,l),{F}) p(H_1(k,l)|{F}) + p(Y(k,l)| H_0(k,l),{F}) p(H_0(k,l)|{F})} \ &=\frac{p(Y(k,l)| H_1(k,l),{F}) p}{p(Y(k,l)| H_1(k,l),{F}) p + p(Y(k,l)| H_0(k,l),{F}) (1-p)}\ &= \frac{p\Delta(k,l)}{p\Delta(k,l) + (1-p)} \ &= \frac{1}{1 +(1-p)/(p\Delta(k,l))} \tag{4} \end{aligned}p (H 1 ​(k ,l )∣Y (k ,l ),{F })​=p (Y (k ,l )∣H 1 ​(k ,l ),{F })p (H 1 ​(k ,l )∣{F })p ({F })+p (Y (k ,l )∣H 0 ​(k ,l ),{F })p (H 0 ​(k ,l )∣{F })p ({F })p (Y (k ,l )∣H 1 ​(k ,l ),{F })p (H 1 ​(k ,l )∣{F })p ({F })​=p (Y (k ,l )∣H 1 ​(k ,l ),{F })p (H 1 ​(k ,l )∣{F })+p (Y (k ,l )∣H 0 ​(k ,l ),{F })p (H 0 ​(k ,l )∣{F })p (Y (k ,l )∣H 1 ​(k ,l ),{F })p (H 1 ​(k ,l )∣{F })​=p (Y (k ,l )∣H 1 ​(k ,l ),{F })p +p (Y (k ,l )∣H 0 ​(k ,l ),{F })(1 −p )p (Y (k ,l )∣H 1 ​(k ,l ),{F })p ​=p Δ(k ,l )+(1 −p )p Δ(k ,l )​=1 +(1 −p )/(p Δ(k ,l ))1 ​​(4 )

其中p代表基于语音特征{ F } {F}{F }的条件语音存在概率(特征{ Y , F } {Y,F}{Y ,F }的先验语音存在概率),Δ ( k , l ) \Delta(k,l)Δ(k ,l )表示似然比。
p ( H 1 ( k , l ) ∣ { F } ) = p (5) p(H_1(k,l)|{F})=p \tag{5}p (H 1 ​(k ,l )∣{F })=p (5 )

Δ ( k , l ) = p ( Y ( k , l ) ∣ H 1 ( k , l ) , { F } ) p ( Y ( k , l ) ∣ H 0 ( k , l ) , { F } ) (6) \Delta(k,l)=\frac{p(Y(k,l)| H_1(k,l),{F})}{p(Y(k,l)| H_0(k,l),{F})} \tag{6}Δ(k ,l )=p (Y (k ,l )∣H 0 ​(k ,l ),{F })p (Y (k ,l )∣H 1 ​(k ,l ),{F })​(6 )

假设干净语音和噪声的短时傅里叶变换系数满足复高斯分布且不相关,则概率密度函数可以得到如下:

[En]

Assuming that the short-time Fourier transform coefficients of clean speech and noise satisfy complex Gaussian distribution and are irrelevant, the probability density function can be obtained as follows:

p ( Y ( k , l ) ∣ H 0 ( k , l ) , { F } ) = 1 π λ x ( k , l ) e x p ( − ∣ Y ( k , l ) ∣ 2 λ d ( k ) ) , p ( Y ( k , l ) ∣ H 1 ( k , l ) , { F } ) = 1 π ( λ x ( k , l ) + λ d ( k , l ) ) e x p ( − ∣ Y ( k , l ) ∣ 2 λ x ( k ) + λ d ( k ) ) (7) \begin{aligned} p(Y(k,l)| H_0(k,l),{F}) &= \frac{1}{\pi\lambda_x(k,l)}exp(-\frac{|Y(k,l)|^2}{\lambda_d(k)}), \ p(Y(k,l)| H_1(k,l),{F}) &= \frac{1}{\pi(\lambda_x(k,l)+\lambda_d(k,l))}exp(-\frac{|Y(k,l)|^2}{\lambda_x(k)+\lambda_d(k)}) \end{aligned} \tag{7}p (Y (k ,l )∣H 0 ​(k ,l ),{F })p (Y (k ,l )∣H 1 ​(k ,l ),{F })​=πλx ​(k ,l )1 ​e x p (−λd ​(k )∣Y (k ,l )∣2 ​),=π(λx ​(k ,l )+λd ​(k ,l ))1 ​e x p (−λx ​(k )+λd ​(k )∣Y (k ,l )∣2 ​)​(7 )

将式(4)带入式(3)得到似然比的值为,
Δ ( k , l ) = p ( Y ( k , l ) ∣ H 1 ( k , l ) , { F } ) p ( Y ( k , l ) ∣ H 0 ( k , l ) , { F } ) = λ d ( k ) λ x ( k ) + λ d ( k ) e x p ( ∣ Y ( k , l ) ∣ 2 λ d ( k ) − ∣ Y ( k , l ) ∣ 2 λ x ( k ) + λ d ( k ) ) = 1 1 + ζ ( k , l ) e x p ( v ( k , l ) ) (8) \begin{aligned}\Delta(k,l) &= \frac{p(Y(k,l)| H_1(k,l),{F})}{p(Y(k,l)| H_0(k,l),{F})} \ &= \frac{\lambda_d(k)}{\lambda_x(k)+\lambda_d(k)}exp(\frac{|Y(k,l)|^2}{\lambda_d(k)} -\frac{|Y(k,l)|^2}{\lambda_x(k)+\lambda_d(k)}) &=\frac{1}{1+\zeta(k,l)}exp(v(k,l)) \tag{8} \end{aligned}Δ(k ,l )​=p (Y (k ,l )∣H 0 ​(k ,l ),{F })p (Y (k ,l )∣H 1 ​(k ,l ),{F })​=λx ​(k )+λd ​(k )λd ​(k )​e x p (λd ​(k )∣Y (k ,l )∣2 ​−λx ​(k )+λd ​(k )∣Y (k ,l )∣2 ​)​=1 +ζ(k ,l )1 ​e x p (v (k ,l ))​(8 )

其中ζ ( k , l ) = λ x ( k , l ) λ d ( k , l ) \zeta(k,l)=\frac{\lambda_x(k,l)}{\lambda_d(k,l)}ζ(k ,l )=λd ​(k ,l )λx ​(k ,l )​为先验信噪比,γ ( k , l ) = ∣ Y ( k , l ) ∣ 2 λ d ( k , l ) \gamma(k,l)=\frac{|Y(k,l)|^2}{\lambda_d(k,l)}γ(k ,l )=λd ​(k ,l )∣Y (k ,l )∣2 ​为后验信噪比,v ( k , l ) = γ ( k , l ) ζ ( k , l ) 1 + ζ ( k , l ) v(k,l)=\frac{\gamma(k,l)\zeta(k,l)}{1+\zeta(k,l)}v (k ,l )=1 +ζ(k ,l )γ(k ,l )ζ(k ,l )​是先验后验信噪比的函数。
计算式(8)需要估计先后验信噪比,估计完后将式(8)和式(5)带入式(4)中便可以求得p ( H 1 ( k , l ) ∣ Y ( k , l ) , { F } ) p(H_1(k,l)|Y(k,l),{F})p (H 1 ​(k ,l )∣Y (k ,l ),{F }) 。

估计基于 { F } {F}{F } 的条件语音存在概率(式5)

webrtc 在计算时使用了3中特征,似然比,谱平坦度和谱模板差异,3个特征的权重是动态调整的。分别计算似然比,谱平坦度和谱模板差异的数值,同时在线更新三个特征值得阈值,利用映射函数将特征值和阈值映射为基于{ F } {F}{F }的条件语音存在概率(也可以理解为基于特征{ Y , F } {Y,F}{Y ,F }的先验语音存在概率)。

似然比的定义见等式(8)所示,首先先将对数似然比进行平滑处理,接着将所有频率的对数似然进行平均做为最终的结果。
a v g _ l o g _ l r t ( k , l ) = a v g _ l o g _ l r t ( k , l − 1 ) + 0.5 ∗ ( l o g _ l r t ( k , l ) − a v g _ l o g _ l r t ( k , l − 1 ) ) F 1 = l r t ( l ) = m e a n ( a v g _ l o g _ l r t ( k , l ) ) (8) avg_log_lrt(k,l) = avg_log_lrt(k,l-1) + 0.5*(log_lrt(k,l) – avg_log_lrt(k,l-1)) \ F_1 = lrt(l) = mean(avg_log_lrt(k,l)) \tag{8}a v g _l o g _l r t (k ,l )=a v g _l o g _l r t (k ,l −1 )+0 .5 ∗(l o g _l r t (k ,l )−a v g _l o g _l r t (k ,l −1 ))F 1 ​=l r t (l )=m e a n (a v g _l o g _l r t (k ,l ))(8 )

稳态噪声的基频和谐波分量比语音相对较小,可以用来区分语音和噪声。频谱平坦度被定义为每个频点的幅度频谱的几何平均值与算术平均值的比率,即,

[En]

The fundamental frequency and harmonic components of steady-state noise are relatively smaller than speech, which can be used to distinguish speech from noise. Spectral flatness is defined as the ratio of the geometric average to the arithmetic average of the amplitude spectrum of each frequency point, that is,

F 2 = ∏ i = k K ∣ Y ( k ) ) ∣ K 1 K ∑ k = 0 K ∣ Y ( k ) ∣ (9) F_2 = \frac{\sqrt[K]{\prod_{i=k}^{K}|Y(k))|}}{\frac{1}{K}\sum_{k=0}^{K}|Y(k)|} \tag{9}F 2 ​=K 1 ​∑k =0 K ​∣Y (k )∣K ∏i =k K ​∣Y (k ))∣​​(9 )

开K K K次方的运算可以通过先对数运算在指数运算进行简化,已知几何平均的值是小于等于算数平均的值,相等的条件是所有数一样,当语音的频谱分布不均匀时,谱平坦度的计算值就会偏小,噪声的频谱分布均匀,谱平坦度偏大。

通过计算当前频谱的幅度与预置噪声的频谱幅度之间的差来计算频谱模板差。当假设噪声是稳定的时,可以从先前语音概率较低的帧中学习。

[En]

The spectral template difference is calculated by calculating the difference between the amplitude of the current spectrum and the spectral amplitude of the preset noise. When the noise is assumed to be steady, it can be learned from the frames with low probability of previous speech.

使用tanh函数将特征值转换到[0,1]的区间,转换方法为,
M ( F n ) = 0.5 ( t a n h ( ± w ( F n − T n ) ) + 1 ) (10) M(F_n)=0.5(tanh(\pm w(F_n-T_n)) + 1) \tag{10}M (F n ​)=0 .5 (t a n h (±w (F n ​−T n ​))+1 )(1 0 )

其中T n T_n T n ​是特征的阈值,w n w_n w n ​是宽带参数。最终的语音存在概率为三个特征映射函数的加权,即,
p = ∑ n = 1 3 w n M ( F n ) (11) p=\sum_{n=1}^3w_nM(F_n) \tag{11}p =n =1 ∑3 ​w n ​M (F n ​)(1 1 )

每个特征的阈值由直方图统计确定,并且阈值每500帧更新一次。

[En]

The threshold of each feature is determined by histogram statistics, and the threshold is updated every 500 frames.

噪声情况下的似然比小于等于1,利用噪声时的似然比均值代表阈值。实际计算时统计似然比在0~100之间的分布,直方图的大小为1000,首先计算最小10个分布(似然比在0-1之间)的似然比平均值average,在计算所有分布的似然比平均值average_global,如果两个结果差异不大,则认为都是噪声,似然比阈值设置为1,将low_lrt_fluctuations 设置为1(谱模板差异时可以用到),差异比较大的话,则将阈值设置为average*1.2,且将最后的结果限制在0.2~1之间。

阈值就是噪声的谱平坦度值,噪声的谱平坦度计算利用噪声的特点,即出现次数最多的数值(直方图最大值对应的横坐标)为噪声的谱平坦度。实际计算时只统计谱平坦度在0~50之间的分布结果,直方图的大小为1000,寻找直方图最大数值和第二大数值的位置,如果位置相邻的话,且第二大次数的值大于第一次数数值的一半以上,则认为两个分布可以合并,合并后的数值为两个分布数值的相加,横坐标为两个相邻最大数值对应横坐标的中间值,否则不合并,使用最大值对应的数值和横坐标。
确定好权重spectral_flatness_peak_weight(数值)和阈值spectral_flatness_peak_position(横坐标位置)后,根据下面的等式判断是否使用谱平坦度这个特征,
如果spectral_flatness_peak_weight < 0.3f * 500 或者 spectral_flatness_peak_position < 0.6f 则不使用改特征,否则使用该特征。
如果使用该特征,则平坦度阈值为flatness_threshold = 0.9 * spectral_flatness_peak_position,将最终结果限制在0.1~0.9之间。

光谱模板差异阈值的确定方法与光谱平坦度阈值相同,只是在以下细节上有所不同。

[En]

The method of determining the spectral template difference threshold is the same as the spectral flatness threshold, except that there are some differences in the following details.

1)只统计谱模板差异在0~100之间的分布。
2)使用标准为use_spec_diff = spectral_diff_peak_weight < 0.3f * 500 || low_lrt_fluctuations ? 0 : 1;意味着只有阈值对应的数值比例大于30%且似然比波动比较大时才使用。
3)谱模板差异阈值等于1.2*spectral_diff_peak_position,结果限制在0.16~1之间。

似然比计算(式8)

在计算特征时需要用到先验信噪比和后验信噪比,从而需要先进行噪声估计,webrtc 采用的时自适应分位数方法进行噪声估计,具体的内容参考webrtc分位数噪声估计

利用分位数噪声估计方法估计出初始噪声d q b n e ( k , l ) d_{qbne}(k,l)d q b n e ​(k ,l )的情况下,后验信噪比的估计只需要利用当前帧的能量除以噪声能量即可,即,
γ ( k , l ) = ∣ Y ( k , l ) ∣ 2 d q b n e ( k , l ) (12) \gamma(k,l) =\frac{|Y(k,l)|^2}{d_{qbne}(k,l)} \tag{12}γ(k ,l )=d q b n e ​(k ,l )∣Y (k ,l )∣2 ​(1 2 )

而先验信噪比的估计需要利用DD准则进行估计,即,
γ ( k , l ) = 0.98 ∣ Y ( k , l − 1 ) ∣ 2 G ( k , l − 1 ) d q b n e ( k , l − 1 ) + ( 1 − 0.98 ) ∣ Y ( k , l ) ∣ 2 d q b n e ( k , l ) (13) \gamma(k,l) =0.98 \frac{|Y(k,l-1)|^2G(k,l-1)}{d_{qbne}(k,l-1)} + (1-0.98) \frac{|Y(k,l)|^2}{d_{qbne}(k,l)} \tag{13}γ(k ,l )=0 .9 8 d q b n e ​(k ,l −1 )∣Y (k ,l −1 )∣2 G (k ,l −1 )​+(1 −0 .9 8 )d q b n e ​(k ,l )∣Y (k ,l )∣2 ​(1 3 )
将式(12)和式(13)带入式(8)求得似然比。

这里的公式推导了采样闪回的方式,在实际计算中应该颠倒过来,即,

[En]

The formula here deduces the way of sampling flashback, which should be reversed in actual calculation, that is,

1)利用分位数噪声估计方法估计出初始噪声。
2)利用式(13)和(12)完成先后验信噪比的计算。
3)将先后验信噪比结果和频谱能量带入式(8-11)完成基于特征{ F } {F}{F }
的条件语音存在概率(基于特征{ Y , F } {Y,F}{Y ,F }的先验语音存在概率)和似然比的计算。
4)将似然比和基于特征{ F } {F}{F }的条件语音存在概率带入式(4)计算最终的语音存在概率。
5)将最终的语音存在概率带入式(2)更新噪声。
6)将噪声估计值带入式(1)计算增益函数。

Original: https://blog.csdn.net/suijue9389/article/details/120715951
Author: myuzhao
Title: webrtc 降噪算法总结

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/497926/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球