FBK和MFCC特征

2023年5月23日下午9:40 • 人工智能 • 阅读 75

FBK: filter bank

MFCC: Mel-frequency ceptral coefficients 梅尔频率倒谱系数

MFCC特征的计算过程：

输入是一段语音波形，shape=(14520,)

1.预加重pre emphasis:

为什么要预先加重：当声音在空中传播时，高频部分衰减更多，这部分衰减应该恢复。

[En]

Why should it be pre-aggravated: when the voice travels in the air, the high-frequency part attenuates more, and this part of the attenuation should be restored.

怎样预加重：

pre_emphasis_coeff = 0.95
x(n) = x(n) - pre_emphasis_coeff * x(n-1)

预加重后shape=(14520,)

2.分帧：

frame_len = 25 # each frame length (ms)
frame_shift = 10 # frame shift length (ms)
frame_len_samples = frame_len*fs//1000 # each frame length (samples) =200
frame_shift_samples = frame_shift*fs//1000 # frame shifte length (samples) =80

分帧后shape=(180,200) 即这段语音对应180个帧，每帧200个采样点。

3.预加窗

为什么是前窗：窗函数非常平滑，这样每一帧两端的采样点都可以平滑地衰减到零，从而可以获得傅里叶变换后旁瓣的强度，获得更高质量的频谱。

[En]

Why pre-window: the window function is very smooth, so that the sampling points at both ends of each frame can be smoothly attenuated to zero, so that the intensity of the post-sidelobe of the Fourier transform can be obtained and a higher quality spectrum can be obtained.

汉明窗：

窗的长度也是200，与每帧的采样点数相同。中间最大值为1，两边逐渐衰减到零。

加窗前某帧的图像：

加窗后的图像：

您可以看到，开窗后，帧数据的两端逐渐衰减为零。

[En]

You can see that after windowing, both ends of the frame data are gradually attenuated to zero.

加窗后的shape=(180,200)

(离散)傅里叶变换

物理意义：将信号从时域转换到频域，得到每个频率(或每个频点)的幅值和相位。

[En]

Physical meaning: convert the signal from time domain to frequency domain to get the amplitude and phase of each frequency (or each frequency point).

K=512 # length of DFT

傅里叶变换后的shape=(180,257)

第2轴的维度是257，表示我们得到了257个频率点上的幅度和相位。傅里叶变换后是复数，复数的实部表示该频率点上正弦信号的幅度，复数的虚部表示该频率点上正弦信号的相位。

4.计算能量谱

获取每个频率(每个频率点)的能量。

[En]

Get the energy at each frequency (at each frequency point).

计算方法：复数实部和虚部的平方和

[En]

Calculation method: the sum of squares of the real and imaginary parts of a complex number

power_spec = np.absolute(freq_domain_data) ** 2 * (1/K) # power spectrum

计算能量谱的shape=(180,257)

5.梅尔滤波Mel-filter

WHY: 人耳对语音的低频部分和高频部分的敏感度不一样，对低频部分更敏感，对高频部分不敏感。

梅尔频率与普通的HZ频率是一一对应的，计算方式如下：

""" 3. Apply the mel filterbank to the power spectrum, sum the energy in each filter.

    The Mel scale relates perceived frequency, or pitch, of a pure tone to its actual measured frequency.

    Humans are much better at discerning small changes in pitch at low frequencies than they are at high frequencies.

    Incorporating this scale makes our features match more closely what humans hear.

    The formula for converting from frequency to Mel scale is:
        M(f) = 2595*log10(1+f/700)
    And formula for converting from Mel scale to frequency is:
        F(m) = 700*(10**(m/2595)-1)
"""
low_frequency = 20 # We don't use start from 0 Hz because human ear is not able to perceive low frequency signal.

high_frequency = fs//2 # if the speech is sampled at f Hz then our upper frequency is limited to 2/f Hz. =4000
low_frequency_mel = 2595 * np.log10(1 + low_frequency / 700) # =31.74
high_frequency_mel = 2595 * np.log10(1 + high_frequency / 700) # = 2146.06

将频率范围[low_frequency, high_frequency]分成40个频率区间，我们要计算每个频率区间内的能量。

计算方法：构造一组滤光片，然后将滤光片与能谱相乘

[En]

Calculation method: construct a set of filters, and then multiply the filter with the energy spectrum

梅尔滤波后的shape=(180, 40)

第2维的40就对应40个频率区间。

6.取log

WHY：纵轴的放缩，可以放大低能量处的能量差异。想想log的图像

取log后的shape=（180,40）

至此，我们得到了FBK特征。

7.离散余弦变换DCT

通常，只保留离散余弦变换后的前12或20个点。

[En]

Generally, only the first 12 or 20 points after discrete cosine transform are retained.

num_ceps = 12 # MFCC feature dims, usually between 2-13.

feature from other dims are dropped beacuse they represent rapid changes in filter bank coefficients and they are not helpful for speech models.

mfcc = dct(log_fbank, type=2, axis=1, norm="ortho")[:, 1 : (num_ceps + 1)]

离散余弦变换后的shape=（180, 12）

至此，我们得到了MFCC特征。

一些总结：

1.MFCC特征是在FBK特征的基础上计算得到；

2.MFCC特征比FBK特征维度更低。一般FBK特征是40维，MFCC特征是13维。

Original: https://blog.csdn.net/huang_yx005/article/details/122474081
Author: huang_yx005
Title: FBK和MFCC特征

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/498254/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

热门算法总结 —— AP聚类

1、算法简介（1）概述：AP聚类是在2007年的《Clustering by Passing Messages Between Data Points》一文中首次提出的一种新的聚…

人工智能 2023年6月15日
0077
长期稳定的项目—steam搬砖

啊哦~你想找的内容离你而去了哦内容不存在，可能为如下原因导致： ① 内容还在审核中 ② 内容以前存在，但是由于不符合新的规定而被删除 ③ 内容地址错误 ④ 作者删除了内容。可…

人工智能 2023年7月31日
0054
K近邻分类算法的Python代码实现

文章目录问题描述算法方案代码实现代码测试问题描述 K近邻算法解决的是机器学习领域中，监督学习类别下的回归和分类问题。监督学习是指利用数据的特征及其对应的标签来训练模型，然…

人工智能 2023年7月1日
00109
python知识图谱智能问答_基于电影知识图谱的智能问答系统

importosfrom py2neo importGraphfrom pyhanlp import * from sklearn.naive_bayes importMultin…

人工智能 2023年6月1日
0069
＜＜视觉问答＞＞2022：MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based VQA

目录摘要一、介绍二、Related Work 三、Methodology 3.1、Multimodal Knowledge Triplet Extraction 3.2、Kn…

人工智能 2023年7月13日
0066
【MATLAB教程案例31】基于matlab的人脸检测相关算法的仿真与分析——肤色模型与形态学图像处理方法

### 回答1：控制系统建模与仿真_是一种基于 _matlab/simulink的分析_与实现 _方法。它可以帮助工程师们更好地理解和设计控制系统，从而提高系统的性能和稳定性…

人工智能 2023年6月17日
0052
Mac 安装使用 OpenCV 图像处理神器

大家好，我是青空。最近在折腾图像处理相关的事情，今天就给大家分享一下，Mac 上如何安装和使用 OpenCV。安装方法一使用homebrew安装其实 Mac 上安装 Op…

人工智能 2023年6月18日
0082
【计算机视觉01】基础图像处理

目录 1.opencv的安装 2.图像基础处理 2.1、直方图 2.2、高斯滤波 2.3、直方图均衡化 1.opencv的安装 win+R打开命令提示符窗口，输入如下代码： pip…

人工智能 2023年6月21日
0084
洛谷_P3388

链接点此跳转思路 t a r j a n tarjan t a r j a n 算法割点模板题。代码 #include #include #include #include …

人工智能 2023年6月30日
0081
切比雪夫（Chebyshev）不等式

标准化设随机变量x具有数学期望E ( x ) = μ E(x) = \mu E (x )=μ，方差D ( x ) = σ 2 D(x) = \sigma^{2}D (x )=σ2…

人工智能 2023年6月15日
00109
机器学习（十九）：梯度提升回归(GBR)

文章目录 * – 一、什么是梯度提升回归？ – + 1.1 介绍 + 1.2 步骤 + 1.3 梯度提升的优势 – 二、梯度提升 –…

人工智能 2023年6月16日
0088
Win10下CUDA版OpenCV的编译过程

1.编译环境 1）系统环境：Win10( i7-8700 CPU, 16GB RAM)+RTX 2070+VS2017 Enterprise+CUDA 11.6+cuDNN v8….

人工智能 2023年7月28日
0084
【大数据】《红楼梦》作者分析（QDU）

【大数据】蔬菜价格分析（QDU）【大数据】美国新冠肺炎疫情分析——错误版（QDU）【大数据】美国新冠肺炎疫情分析——正确版（QDU）【大数据】乳腺癌预测——老师给的链接（QD…

人工智能 2023年5月31日
0093
Python 玩转数据 16 – Pandas 数据处理追加 df.append()

引言本文主要介绍 pandas 数据追加 df.append()，更多 Python 进阶系列文章，请参考 Python 进阶学习玩转数据系列内容提要：df1.append(…

人工智能 2023年7月6日
00122
pandas数据处理map、apply、applymap小结

map、apply、applymap小结apply：DataFrame数据函数，series级别操作，应用在DataFrame的行或列中，也可以应用到单独一个Series的每个元素…

人工智能 2023年7月8日
0090
供应化学试剂Boc-NH-PEG-NH2，Boc-NH-PEG-amine，叔丁氧羰基PEG氨基

1、名称英文：Boc-NH-PEG-NH2，Boc-NH-PEG-amine 中文：叔丁氧羰基-聚乙二醇-氨基 2、CAS编号：N/A 3、所属分类：Amine PEG Boc/…

人工智能 2023年6月30日
0074

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

FBK和MFCC特征

大家都在看