# webrtc 降噪算法总结

## 维纳增益

webrtc ns采用的是维纳增益，参考之前写的维纳滤波原理，可得增益函数为，
G ( k , l ) = ∣ X ( k , l ) ∣ 2 ∣ Y ( k , l ) ∣ 2 = 1 − ∣ N ( k , l ) ∣ 2 ∣ Y ( k , l ) ∣ 2 (1) G(k,l)=\frac{|X(k,l)|^2}{|Y(k,l)|^2}=1-\frac{|N(k,l)|^2}{|Y(k,l)|^2} \tag{1}G (k ,l )=∣Y (k ,l )∣2 ∣X (k ,l )∣2 ​=1 −∣Y (k ,l )∣2 ∣N (k ,l )∣2 ​(1 )

## 估计噪声

[En]

Noise estimation uses the method of time recursive smoothing and weighted by the existence probability of speech, that is,

λ d ^ ( k , l ) = γ λ d ^ ( k , l ) + ( 1 − γ ) ( ( 1 − p ( k , l ) ) Y ( k , l ) + p ( k , l ) λ d ^ ( k , l − 1 ) ) (2) \hat{\lambda_d}(k,l)=\gamma\hat{\lambda_d}(k,l) + (1-\gamma)((1 – p(k,l))Y(k,l) + p(k,l)\hat{\lambda_d}(k,l-1)) \tag{2}λd ​^​(k ,l )=γλd ​^​(k ,l )+(1 −γ)((1 −p (k ,l ))Y (k ,l )+p (k ,l )λd ​^​(k ,l −1 ))(2 )

γ \gamma γ为平滑因子，p ( k , l ) p(k,l)p (k ,l )是基于{ Y , F } {Y,F}{Y ,F }估计的条件语音存在概率，F F F代表语音特征。

## 估计基于 { Y , F } {Y,F}{Y ,F } 的条件语音存在概率

[En]

Each frame can be divided into two situations: the presence of speech and the absence of speech, that is,

H 0 ( k , l ) : Y ( k , l ) = D ( k , l ) H 1 ( k , l ) : Y ( k , l ) = X ( k , l ) + D ( k , l ) (3) \begin{aligned} H_0(k,l):Y(k,l) &= D(k,l) \ H_1(k,l):Y(k,l)&=X(k,l) + D(k,l) \end{aligned} \tag{3}H 0 ​(k ,l ):Y (k ,l )H 1 ​(k ,l ):Y (k ,l )​=D (k ,l )=X (k ,l )+D (k ,l )​(3 )

p ( H 1 ( k , l ) ∣ Y ( k , l ) , { F } ) = p ( Y ( k , l ) ∣ H 1 ( k , l ) , { F } ) p ( H 1 ( k , l ) ∣ { F } ) p ( { F } ) p ( Y ( k , l ) ∣ H 1 ( k , l ) , { F } ) p ( H 1 ( k , l ) ∣ { F } ) p ( { F } ) + p ( Y ( k , l ) ∣ H 0 ( k , l ) , { F } ) p ( H 0 ( k , l ) ∣ { F } ) p ( { F } ) = p ( Y ( k , l ) ∣ H 1 ( k , l ) , { F } ) p ( H 1 ( k , l ) ∣ { F } ) p ( Y ( k , l ) ∣ H 1 ( k , l ) , { F } ) p ( H 1 ( k , l ) ∣ { F } ) + p ( Y ( k , l ) ∣ H 0 ( k , l ) , { F } ) p ( H 0 ( k , l ) ∣ { F } ) = p ( Y ( k , l ) ∣ H 1 ( k , l ) , { F } ) p p ( Y ( k , l ) ∣ H 1 ( k , l ) , { F } ) p + p ( Y ( k , l ) ∣ H 0 ( k , l ) , { F } ) ( 1 − p ) = p Δ ( k , l ) p Δ ( k , l ) + ( 1 − p ) = 1 1 + ( 1 − p ) / ( p Δ ( k , l ) ） (4) \begin{aligned} p(H_1(k,l)|Y(k,l),{F}) &= \frac{p(Y(k,l)| H_1(k,l),{F}) p(H_1(k,l)|{F})p({F})}{p(Y(k,l)| H_1(k,l),{F}) p(H_1(k,l)|{F})p({F}) + p(Y(k,l)| H_0(k,l),{F}) p(H_0(k,l)|{F})p({F})} \ &= \frac{p(Y(k,l)| H_1(k,l),{F}) p(H_1(k,l)|{F})}{p(Y(k,l)| H_1(k,l),{F}) p(H_1(k,l)|{F}) + p(Y(k,l)| H_0(k,l),{F}) p(H_0(k,l)|{F})} \ &=\frac{p(Y(k,l)| H_1(k,l),{F}) p}{p(Y(k,l)| H_1(k,l),{F}) p + p(Y(k,l)| H_0(k,l),{F}) (1-p)}\ &= \frac{p\Delta(k,l)}{p\Delta(k,l) + (1-p)} \ &= \frac{1}{1 +(1-p)/(p\Delta(k,l)）} \tag{4} \end{aligned}p (H 1 ​(k ,l )∣Y (k ,l ),{F })​=p (Y (k ,l )∣H 1 ​(k ,l ),{F })p (H 1 ​(k ,l )∣{F })p ({F })+p (Y (k ,l )∣H 0 ​(k ,l ),{F })p (H 0 ​(k ,l )∣{F })p ({F })p (Y (k ,l )∣H 1 ​(k ,l ),{F })p (H 1 ​(k ,l )∣{F })p ({F })​=p (Y (k ,l )∣H 1 ​(k ,l ),{F })p (H 1 ​(k ,l )∣{F })+p (Y (k ,l )∣H 0 ​(k ,l ),{F })p (H 0 ​(k ,l )∣{F })p (Y (k ,l )∣H 1 ​(k ,l ),{F })p (H 1 ​(k ,l )∣{F })​=p (Y (k ,l )∣H 1 ​(k ,l ),{F })p +p (Y (k ,l )∣H 0 ​(k ,l ),{F })(1 −p )p (Y (k ,l )∣H 1 ​(k ,l ),{F })p ​=p Δ(k ,l )+(1 −p )p Δ(k ,l )​=1 +(1 −p )/(p Δ(k ,l )）1 ​​(4 )

p ( H 1 ( k , l ) ∣ { F } ) = p (5) p(H_1(k,l)|{F})=p \tag{5}p (H 1 ​(k ,l )∣{F })=p (5 )

Δ ( k , l ) = p ( Y ( k , l ) ∣ H 1 ( k , l ) , { F } ) p ( Y ( k , l ) ∣ H 0 ( k , l ) , { F } ) (6) \Delta(k,l)=\frac{p(Y(k,l)| H_1(k,l),{F})}{p(Y(k,l)| H_0(k,l),{F})} \tag{6}Δ(k ,l )=p (Y (k ,l )∣H 0 ​(k ,l ),{F })p (Y (k ,l )∣H 1 ​(k ,l ),{F })​(6 )

[En]

Assuming that the short-time Fourier transform coefficients of clean speech and noise satisfy complex Gaussian distribution and are irrelevant, the probability density function can be obtained as follows:

p ( Y ( k , l ) ∣ H 0 ( k , l ) , { F } ) = 1 π λ x ( k , l ) e x p ( − ∣ Y ( k , l ) ∣ 2 λ d ( k ) ) ， p ( Y ( k , l ) ∣ H 1 ( k , l ) , { F } ) = 1 π ( λ x ( k , l ) + λ d ( k , l ) ) e x p ( − ∣ Y ( k , l ) ∣ 2 λ x ( k ) + λ d ( k ) ) (7) \begin{aligned} p(Y(k,l)| H_0(k,l),{F}) &= \frac{1}{\pi\lambda_x(k,l)}exp(-\frac{|Y(k,l)|^2}{\lambda_d(k)})， \ p(Y(k,l)| H_1(k,l),{F}) &= \frac{1}{\pi(\lambda_x(k,l)+\lambda_d(k,l))}exp(-\frac{|Y(k,l)|^2}{\lambda_x(k)+\lambda_d(k)}) \end{aligned} \tag{7}p (Y (k ,l )∣H 0 ​(k ,l ),{F })p (Y (k ,l )∣H 1 ​(k ,l ),{F })​=πλx ​(k ,l )1 ​e x p (−λd ​(k )∣Y (k ,l )∣2 ​)，=π(λx ​(k ,l )+λd ​(k ,l ))1 ​e x p (−λx ​(k )+λd ​(k )∣Y (k ,l )∣2 ​)​(7 )

Δ ( k , l ) = p ( Y ( k , l ) ∣ H 1 ( k , l ) , { F } ) p ( Y ( k , l ) ∣ H 0 ( k , l ) , { F } ) = λ d ( k ) λ x ( k ) + λ d ( k ) e x p ( ∣ Y ( k , l ) ∣ 2 λ d ( k ) − ∣ Y ( k , l ) ∣ 2 λ x ( k ) + λ d ( k ) ) = 1 1 + ζ ( k , l ) e x p ( v ( k , l ) ) (8) \begin{aligned}\Delta(k,l) &= \frac{p(Y(k,l)| H_1(k,l),{F})}{p(Y(k,l)| H_0(k,l),{F})} \ &= \frac{\lambda_d(k)}{\lambda_x(k)+\lambda_d(k)}exp(\frac{|Y(k,l)|^2}{\lambda_d(k)} -\frac{|Y(k,l)|^2}{\lambda_x(k)+\lambda_d(k)}) &=\frac{1}{1+\zeta(k,l)}exp(v(k,l)) \tag{8} \end{aligned}Δ(k ,l )​=p (Y (k ,l )∣H 0 ​(k ,l ),{F })p (Y (k ,l )∣H 1 ​(k ,l ),{F })​=λx ​(k )+λd ​(k )λd ​(k )​e x p (λd ​(k )∣Y (k ,l )∣2 ​−λx ​(k )+λd ​(k )∣Y (k ,l )∣2 ​)​=1 +ζ(k ,l )1 ​e x p (v (k ,l ))​(8 )

## 估计基于 { F } {F}{F } 的条件语音存在概率(式5)

webrtc 在计算时使用了3中特征，似然比，谱平坦度和谱模板差异，3个特征的权重是动态调整的。分别计算似然比，谱平坦度和谱模板差异的数值，同时在线更新三个特征值得阈值，利用映射函数将特征值和阈值映射为基于{ F } {F}{F }的条件语音存在概率（也可以理解为基于特征{ Y , F } {Y,F}{Y ,F }的先验语音存在概率）。

a v g _ l o g _ l r t ( k , l ) = a v g _ l o g _ l r t ( k , l − 1 ) + 0.5 ∗ ( l o g _ l r t ( k , l ) − a v g _ l o g _ l r t ( k , l − 1 ) ) F 1 = l r t ( l ) = m e a n ( a v g _ l o g _ l r t ( k , l ) ) (8) avg_log_lrt(k,l) = avg_log_lrt(k,l-1) + 0.5*(log_lrt(k,l) – avg_log_lrt(k,l-1)) \ F_1 = lrt(l) = mean(avg_log_lrt(k,l)) \tag{8}a v g _l o g _l r t (k ,l )=a v g _l o g _l r t (k ,l −1 )+0 .5 ∗(l o g _l r t (k ,l )−a v g _l o g _l r t (k ,l −1 ))F 1 ​=l r t (l )=m e a n (a v g _l o g _l r t (k ,l ))(8 )

[En]

The fundamental frequency and harmonic components of steady-state noise are relatively smaller than speech, which can be used to distinguish speech from noise. Spectral flatness is defined as the ratio of the geometric average to the arithmetic average of the amplitude spectrum of each frequency point, that is,

F 2 = ∏ i = k K ∣ Y ( k ) ) ∣ K 1 K ∑ k = 0 K ∣ Y ( k ) ∣ (9) F_2 = \frac{\sqrt[K]{\prod_{i=k}^{K}|Y(k))|}}{\frac{1}{K}\sum_{k=0}^{K}|Y(k)|} \tag{9}F 2 ​=K 1 ​∑k =0 K ​∣Y (k )∣K ∏i =k K ​∣Y (k ))∣​​(9 )

[En]

The spectral template difference is calculated by calculating the difference between the amplitude of the current spectrum and the spectral amplitude of the preset noise. When the noise is assumed to be steady, it can be learned from the frames with low probability of previous speech.

M ( F n ) = 0.5 ( t a n h ( ± w ( F n − T n ) ) + 1 ) (10) M(F_n)=0.5(tanh(\pm w(F_n-T_n)) + 1) \tag{10}M (F n ​)=0 .5 (t a n h (±w (F n ​−T n ​))+1 )(1 0 )

p = ∑ n = 1 3 w n M ( F n ) (11) p=\sum_{n=1}^3w_nM(F_n) \tag{11}p =n =1 ∑3 ​w n ​M (F n ​)(1 1 )

[En]

The threshold of each feature is determined by histogram statistics, and the threshold is updated every 500 frames.

[En]

The method of determining the spectral template difference threshold is the same as the spectral flatness threshold, except that there are some differences in the following details.

1）只统计谱模板差异在0~100之间的分布。
2）使用标准为use_spec_diff = spectral_diff_peak_weight < 0.3f * 500 || low_lrt_fluctuations ? 0 : 1；意味着只有阈值对应的数值比例大于30%且似然比波动比较大时才使用。
3）谱模板差异阈值等于1.2*spectral_diff_peak_position，结果限制在0.16~1之间。

## 似然比计算（式8）

γ ( k , l ) = ∣ Y ( k , l ) ∣ 2 d q b n e ( k , l ) (12) \gamma(k,l) =\frac{|Y(k,l)|^2}{d_{qbne}(k,l)} \tag{12}γ(k ,l )=d q b n e ​(k ,l )∣Y (k ,l )∣2 ​(1 2 )

γ ( k , l ) = 0.98 ∣ Y ( k , l − 1 ) ∣ 2 G ( k , l − 1 ) d q b n e ( k , l − 1 ) + ( 1 − 0.98 ) ∣ Y ( k , l ) ∣ 2 d q b n e ( k , l ) (13) \gamma(k,l) =0.98 \frac{|Y(k,l-1)|^2G(k,l-1)}{d_{qbne}(k,l-1)} + (1-0.98) \frac{|Y(k,l)|^2}{d_{qbne}(k,l)} \tag{13}γ(k ,l )=0 .9 8 d q b n e ​(k ,l −1 )∣Y (k ,l −1 )∣2 G (k ,l −1 )​+(1 −0 .9 8 )d q b n e ​(k ,l )∣Y (k ,l )∣2 ​(1 3 )

[En]

The formula here deduces the way of sampling flashback, which should be reversed in actual calculation, that is,

1）利用分位数噪声估计方法估计出初始噪声。
2）利用式（13）和（12）完成先后验信噪比的计算。
3）将先后验信噪比结果和频谱能量带入式（8-11）完成基于特征{ F } {F}{F }

4）将似然比和基于特征{ F } {F}{F }的条件语音存在概率带入式（4）计算最终的语音存在概率。
5）将最终的语音存在概率带入式（2）更新噪声。
6）将噪声估计值带入式（1）计算增益函数。

Original: https://blog.csdn.net/suijue9389/article/details/120715951
Author: myuzhao
Title: webrtc 降噪算法总结

