FBK和MFCC特征

FBK: filter bank

MFCC: Mel-frequency ceptral coefficients 梅尔频率倒谱系数

MFCC特征的计算过程:

输入是一段语音波形,shape=(14520,)

1.预加重pre emphasis:

为什么要预先加重:当声音在空中传播时,高频部分衰减更多,这部分衰减应该恢复。

[En]

Why should it be pre-aggravated: when the voice travels in the air, the high-frequency part attenuates more, and this part of the attenuation should be restored.

怎样预加重:

pre_emphasis_coeff = 0.95
x(n) = x(n) - pre_emphasis_coeff * x(n-1)

预加重后shape=(14520,)

2.分帧:

frame_len = 25 # each frame length (ms)
frame_shift = 10 # frame shift length (ms)
frame_len_samples = frame_len*fs//1000 # each frame length (samples) =200
frame_shift_samples = frame_shift*fs//1000 # frame shifte length (samples) =80

分帧后shape=(180,200) 即这段语音对应180个帧,每帧200个采样点。

3.预加窗

为什么是前窗:窗函数非常平滑,这样每一帧两端的采样点都可以平滑地衰减到零,从而可以获得傅里叶变换后旁瓣的强度,获得更高质量的频谱。

[En]

Why pre-window: the window function is very smooth, so that the sampling points at both ends of each frame can be smoothly attenuated to zero, so that the intensity of the post-sidelobe of the Fourier transform can be obtained and a higher quality spectrum can be obtained.

汉明窗:

FBK和MFCC特征

窗的长度也是200,与每帧的采样点数相同。中间最大值为1,两边逐渐衰减到零。

加窗前某帧的图像:

FBK和MFCC特征

加窗后的图像:

FBK和MFCC特征

您可以看到,开窗后,帧数据的两端逐渐衰减为零。

[En]

You can see that after windowing, both ends of the frame data are gradually attenuated to zero.

加窗后的shape=(180,200)

  1. (离散)傅里叶变换

物理意义:将信号从时域转换到频域,得到每个频率(或每个频点)的幅值和相位。

[En]

Physical meaning: convert the signal from time domain to frequency domain to get the amplitude and phase of each frequency (or each frequency point).

K=512 # length of DFT

傅里叶变换后的shape=(180,257)

第2轴的维度是257,表示我们得到了257个频率点上的幅度和相位。傅里叶变换后是复数,复数的实部表示该频率点上正弦信号的幅度,复数的虚部表示该频率点上正弦信号的相位。

4.计算能量谱

获取每个频率(每个频率点)的能量。

[En]

Get the energy at each frequency (at each frequency point).

计算方法:复数实部和虚部的平方和

[En]

Calculation method: the sum of squares of the real and imaginary parts of a complex number

power_spec = np.absolute(freq_domain_data) ** 2 * (1/K) # power spectrum

计算能量谱的shape=(180,257)

5.梅尔滤波Mel-filter

WHY: 人耳对语音的低频部分和高频部分的敏感度不一样,对低频部分更敏感,对高频部分不敏感。

梅尔频率与普通的HZ频率是一一对应的,计算方式如下:

""" 3. Apply the mel filterbank to the power spectrum, sum the energy in each filter.

    The Mel scale relates perceived frequency, or pitch, of a pure tone to its actual measured frequency.

    Humans are much better at discerning small changes in pitch at low frequencies than they are at high frequencies.

    Incorporating this scale makes our features match more closely what humans hear.

    The formula for converting from frequency to Mel scale is:
        M(f) = 2595*log10(1+f/700)
    And formula for converting from Mel scale to frequency is:
        F(m) = 700*(10**(m/2595)-1)
"""
low_frequency = 20 # We don't use start from 0 Hz because human ear is not able to perceive low frequency signal.

high_frequency = fs//2 # if the speech is sampled at f Hz then our upper frequency is limited to 2/f Hz. =4000
low_frequency_mel = 2595 * np.log10(1 + low_frequency / 700) # =31.74
high_frequency_mel = 2595 * np.log10(1 + high_frequency / 700) # = 2146.06

将频率范围[low_frequency, high_frequency]分成40个频率区间,我们要计算每个频率区间内的能量。

计算方法:构造一组滤光片,然后将滤光片与能谱相乘

[En]

Calculation method: construct a set of filters, and then multiply the filter with the energy spectrum

FBK和MFCC特征

梅尔滤波后的shape=(180, 40)

第2维的40就对应40个频率区间。

6.取log

WHY:纵轴的放缩,可以放大低能量处的能量差异。想想log的图像

取log后的shape=(180,40)

至此,我们得到了FBK特征。

7.离散余弦变换DCT

通常,只保留离散余弦变换后的前12或20个点。

[En]

Generally, only the first 12 or 20 points after discrete cosine transform are retained.

num_ceps = 12 # MFCC feature dims, usually between 2-13.

feature from other dims are dropped beacuse they represent rapid changes in filter bank coefficients and they are not helpful for speech models.

mfcc = dct(log_fbank, type=2, axis=1, norm="ortho")[:, 1 : (num_ceps + 1)]

离散余弦变换后的shape=(180, 12)

至此,我们得到了MFCC特征。

一些总结:

1.MFCC特征是在FBK特征的基础上计算得到;

2.MFCC特征比FBK特征维度更低。一般FBK特征是40维,MFCC特征是13维。

Original: https://blog.csdn.net/huang_yx005/article/details/122474081
Author: huang_yx005
Title: FBK和MFCC特征

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/498254/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球