在Recognition算法中，常见的特征提取方法有哪些

2024年1月3日上午7:48 • 人工智能 • 阅读 53

特征提取方法在Recognition算法中的作用

在Recognition算法中，特征提取方法起着十分重要的作用。它们能够将原始数据转化为特征向量，这样就可以用来表示不同的目标或者对象。常见的特征提取方法包括数字信号处理、图像处理、文本分析等等。下面将具体介绍几种常见的特征提取方法及其原理、公式推导、计算步骤、代码示例和代码细节解释。

语音识别中的MFCC特征提取方法

MFCC（Mel Frequency Cepstral Coefficients）是一种常用的语音信号特征提取方法，经常被应用于语音识别任务中。其主要原理是通过模拟人类听觉系统的感知特性，将语音信号转化为一组能够描述语音特征的系数。

算法原理

预加重：首先对语音信号进行预处理，通过高通滤波器加强高频成分，减小后续计算时的高频噪声。
分帧：将预加重后的信号切分成若干帧，每帧的长度通常为20-40ms，相邻帧之间有一定重叠。
傅里叶变换：对每帧信号应用FFT（快速傅里叶变换），将时域信号转换为频域信号。
梅尔滤波器组：建立一组等间距的三角形滤波器，通过与频谱图相乘得到一组滤波器输出。
对数操作：对滤波器输出取对数，以增强低振动频率的信息，同时减弱较高频率的信息。
DCT变换：对取对数后的信号应用DCT（离散余弦变换），得到一组MFCC系数。

公式推导

MFCC的公式推导主要涉及到预加重、傅里叶变换、梅尔滤波器组、对数操作和DCT变换等步骤。以计算第k个滤波器输出为例，推导如下：

预加重：$$s_{\text{preemph}}(n) = s(n)-\alpha \cdot s(n-1)$$
分帧：切割预加重后的信号为帧，通常给定帧长为N，帧移为M。
傅里叶变换：对每帧信号应用FFT，得到频谱$$X(k)$$。
梅尔滤波器组：计算频谱与每个滤波器的重叠部分$$H_m(k)$$，其中m表示滤波器索引。
对数操作：取对数得到$$\log(H_m(k))$$。
DCT变换：对$$\log(H_m(k))$$应用DCT变换，得到第k个滤波器输出：$$C_m(k)$$。

将上述过程应用于不同的帧，我们可以得到一组MFCC系数$$C_m(k)$$。

计算步骤

预加重：对原始语音信号应用预加重。
分帧：将预加重后的信号切分成帧。
傅里叶变换：对每帧信号应用FFT，得到频谱。
梅尔滤波器组：计算频谱与每个滤波器的重叠部分。
对数操作：取对数得到滤波器输出的对数值。
DCT变换：对对数值应用DCT变换，得到MFCC系数。

Python代码示例和代码细节解释

下面给出一个使用Python实现MFCC特征提取的示例代码：

import numpy as np
import scipy.io.wavfile as wav
from scipy.fftpack import dct

# 预加重
def pre_emphasis(signal, pre_emphasis=0.97):
 emphasized_signal = np.append(signal[0], signal[1:] - pre_emphasis artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls signal[:-1])
 return emphasized_signal

# 分帧
def split_into_frames(signal, frame_len, frame_step):
 num_frames = int(np.ceil(len(signal) / frame_step))
 padded_signal_length = num_frames artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls frame_step
 padded_signal = np.pad(signal, (0, padded_signal_length - len(signal)), "constant")
 indices = np.tile(np.arange(0, frame_len), (num_frames, 1)) + np.tile(np.arange(0, num_frames artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls frame_step, frame_step), (frame_len, 1)).T
 frames = padded_signal[indices.astype(np.int32, copy=False)]
 return frames

# 傅里叶变换
def compute_fft(frames, n_fft):
 mag_frames = np.absolute(np.fft.rfft(frames, n_fft))
 return mag_frames

# 梅尔滤波器组
def mel_filter_bank(n_filters, n_fft, sample_rate):
 low_freq_mel = 0
 high_freq_mel = 2595 artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.log10(1 + (sample_rate / 2) / 700)
 mel_points = np.linspace(low_freq_mel, high_freq_mel, n_filters + 2)
 frequencies = 700 artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls (10**(mel_points / 2595) - 1)
 bin_points = np.floor((n_fft + 1) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls frequencies / sample_rate).astype(int)

 filter_banks = np.zeros((n_filters, int(np.floor(n_fft / 2 + 1))))
 for m in range(1, n_filters + 1):
 f_m_minus = bin_points[m - 1]
 f_m = bin_points[m]
 f_m_plus = bin_points[m + 1]

 for k in range(f_m_minus, f_m):
 filter_banks[m - 1, k] = (k - bin_points[m - 1]) / (bin_points[m] - bin_points[m - 1])
 for k in range(f_m, f_m_plus):
 filter_banks[m - 1, k] = (bin_points[m + 1] - k) / (bin_points[m + 1] - bin_points[m])

 return filter_banks

# 对数操作
def apply_log(filter_banks):
 return np.log(filter_banks)

# DCT变换
def dct_transform(filter_banks, n_coefficients):
 return dct(filter_banks, type=2, axis=1, norm='ortho')[:, :n_coefficients]

# 计算MFCC特征
def compute_mfcc(file_path, n_filters, n_fft, n_coefficients):
 sample_rate, signal = wav.read(file_path)
 emphasized_signal = pre_emphasis(signal)
 frames = split_into_frames(emphasized_signal, frame_len=0.025 artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls sample_rate, frame_step=0.01 artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls sample_rate)
 mag_frames = compute_fft(frames, n_fft)
 filter_banks = mel_filter_bank(n_filters, n_fft, sample_rate)
 filter_banks = np.dot(mag_frames, filter_banks.T)
 filter_banks = np.where(filter_banks == 0, np.finfo(float).eps, filter_banks) # 利用eps处理log时避免出现-inf
 filter_banks = apply_log(filter_banks)
 mfcc = dct_transform(filter_banks, n_coefficients)
 return mfcc

# 使用示例
file_path = 'audio.wav'
n_filters = 26
n_fft = 512
n_coefficients = 13

mfcc = compute_mfcc(file_path, n_filters, n_fft, n_coefficients)
print(mfcc.shape)

上述代码中，我们首先定义了预加重、分帧、傅里叶变换、梅尔滤波器组、对数操作和DCT变换等函数。然后我们根据语音信号文件的路径，使用wav.read从文件中读取语音信号，并进行MFCC特征提取。最后，我们输出MFCC特征的形状。

在compute_mfcc函数中，我们首先将读取的语音信号进行预加重处理，然后将其切分成多个帧。接下来，我们对每一帧应用FFT进行傅里叶变换，得到频谱。然后，我们计算语音信号与梅尔滤波器组的重叠部分，得到滤波器输出。我们对滤波器输出取对数，然后应用DCT变换，最终得到MFCC特征。

注意，在对梅尔滤波器输出取对数时，为了避免出现负无穷大的情况，在代码中使用np.finfo(float).eps将为零的值替换为一个极小的正值。

最后，我们使用compute_mfcc函数对给定的语音信号文件进行MFCC特征提取，并打印输出特征的形状。

希望上述示例代码和解释能够帮助您理解MFCC特征提取的原理、公式推导、计算步骤和代码细节。请根据自己的实际需求进行合适的参数设置和代码调整。

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/823456/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

自用，复习用

HTML5新特性 HTML5 audio 1.使用语法 2.属性: src: 播放音频的路径 loop: 循环播放，true/false autoplay: 自动播放,歌曲加载后可…

人工智能 2023年5月25日
0060
基于蝙蝠算法优化BP神经网络的数据分类算法及其MATLAB实现-附代码

基于蝙蝠算法优化BP神经网络的数据分类算法及其MATLAB实现-附代码文章目录基于蝙蝠算法优化BP神经网络的数据分类算法及其MATLAB实现-附代码 1 蝙蝠算法与BP神经网络…

人工智能 2023年7月3日
0074
【数字图像处理matlab】 (均值、中值、排序、众数、方差/协方差、相关系数、直方图/累计直方图)

【数字图像处理matlab】 (统计与描述) 一些数字图像处理，统计与描述相关代码，matlab编写文章目录【数字图像处理matlab】 (统计与描述) * – …

人工智能 2023年6月18日
0058
解决RuntimeError: Error(s) in loading state_dict for ResNet: Missing key(s) in state_dict: “conv1.0…

在多GPU环境下用Pytorch训练的Resnet分类网络卷积神经网络ResNet训练好之后，测试环境或测试代码用了单GPU版或CPU版，在加载网络的时候报错，报错处代码为： n…

人工智能 2023年6月16日
0048
为什么使用Python保存的视频特别大！！（数据速率/总比特率）

注：引起视频大小的原因有很多，比如fps，视频的尺寸，保存格式例如.mp4、.avi等格式等，本文只是在上述并不能进行改变的基础上，提供了数据速率/总比特率相关的解决方案，请读者…

人工智能 2023年7月19日
0046
使用 Docker部署 Tensorflow Serving 模型服务

准备工作拉取 tensorflow servering 的 docker 镜像： sudo docker pull tensorflow/serving，一般是已经有的进入到一…

人工智能 2023年5月24日
0071
驱动无模块注入dll

文章目录 * – + 实现效果 + 三环无模块注入的方案 + 反射型dll注入方式的改进 + 零环无模块注入方案 + petoshellcode + 无模块注入流程 +…

人工智能 2023年7月30日
0053
一文了解 NebulaGraph 上的 Spark 项目

本文首发于 Nebula Graph Community 公众号最近我试着搭建了方便大家一键试玩的 Nebula Graph 中的 Spark 相关的项目，今天就把它们整理成文分享…

人工智能 2023年6月10日
0078
引导图像滤波（Guided Image Filtering）

[Paper] Guided Image Filtering(2013) 引导图像滤波摘要——在本文中，我们提出了一种新的显式图像滤波器，称为引导滤波器。从局部线性模型导出，引导…

人工智能 2023年6月18日
0094
Informer时序模型(代码解析)

代码解析参考资料建议大家在阅读前有一定 Transformer模型基础，可以先看看 Transformer论文，论文下载链接阅读 Informer时序模型论文，重点关注作者针…

人工智能 2023年7月27日
0081
python——自动化报告word(1)

python——自动化报告word(1) 本文基于python的docx模块的文本替换功能，自动引用excel数据填入word，实现报告自动化。文章目录 python——自动化报…

人工智能 2023年7月7日
0071
排序8: 计数排序

目录 1. 排序思想 2. 图解 3. 代码实现 3.1 逻辑 4. 特性总结 1. 排序思想计数排序又称为鸽巢原理，是对哈希直接定址法的变形应用。操作步骤：统计相同元素出现…

人工智能 2023年6月29日
0088
【Ubuntu20.04编译安装Opencv4.5.4，支持Cuda加速】

啊哦~你想找的内容离你而去了哦内容不存在，可能为如下原因导致： ① 内容还在审核中 ② 内容以前存在，但是由于不符合新的规定而被删除 ③ 内容地址错误 ④ 作者删除了内容。可…

人工智能 2023年7月18日
0042
mongodb/mongoTemplate.upsert批量插入更新数据的实现

今天来记录一下，项目中使用到的mongoTemplate.upsert，在批量更新数据上的用法。 // mongoTemplate.upsert有三种用法，主要功能是更新数据，如果…

人工智能 2023年7月29日
0076
利用IMU进行激光点云运动畸变校正

对IMU进行角速度积分：角速度积分参照了VINS-Mono的预积分流程，激光雷达的频率远低于IMU频率。在一个点云数据的一个周期内（100ms），IMU大约会产生200 ÷ 10 …

人工智能 2023年5月28日
0076
C++与Python实现逆透视变换IPM（鸟瞰图）

一、待解决的问题这是一张普通单目相机拍摄的图像，需要将其处理成鸟瞰图，效果图如下：上面这幅鸟瞰图只包含原图像的一部分信息，并没有包含所有内容（这个问题接下来再回答），但是可以验…

人工智能 2023年7月19日
0059

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30