【语音识别入门】特征提取（Python完整代码）

2023年5月25日上午7:02 • 人工智能 • 阅读 58

1、数字信号处理基础

1.1数字信号处理基础

科学和工程中遇到的信号大多是连续的模拟信号，如电压随时间的变化、一天的温度变化等，而计算机智能地处理离散信号，因此必须对这些连续的模拟信号进行转换。通过采样量化将其转换为数字信号。

[En]

Most of the signals encountered in science and engineering are continuous analog signals, such as voltage changes over time, temperature changes in a day, and so on, while computers intelligently process discrete signals, so these continuous analog signals must be converted. It is converted into digital signal by sampling-quantization.

以 正弦波为例：
x ( t ) = s i n ( 2 Π f ∗ t ) x(t) = sin (2Πf*t)x (t )=s in (2Πf ∗t )

(f f f表示信号本身的频率，单位H z Hz Hz)
首先对正弦波进行采样，每t t t秒进行一次采用，并使用一定范围的离散数值来表示采样值，得到离散信号x ( n ) x(n)x (n ):
x ( n ) = s i n ( 2 Π f ∗ n t ) x(n) = sin (2Πf*nt)x (n )=s in (2Πf ∗n t )

1.2频率混叠

由于采样信号频谱的变化，存在高频分量和低频分量混淆的现象。采样频率不够高，采样点既代表信号中低频信号的采样值，也代表高频信号的采样值。当信号重构时，高频信号被低频信号取代，两种波形完全重叠在一起，造成严重的失真。

[En]

Due to the change of the frequency spectrum of the sampled signal, there is a phenomenon of confusion between high and low frequency components. The sampling frequency is not high enough, and the sampled point represents not only the sample value of the low-frequency signal in the signal, but also the sample value of the high-frequency signal. When the signal is reconstructed, the high-frequency signal is replaced by the low-frequency signal, and the two waveforms are completely overlapped together, resulting in serious distortion.

1.3 奈奎斯特采样定理

采样频率应大于信号中最大频率的两倍。

[En]

The sampling frequency should be more than twice the maximum frequency in the signal.

f s / 2 ≥ f m a x fs/2≥fmax f s /2 ≥f ma x
也就是说，为了有效地消除频率混叠问题，必须在原始信号的一个周期内至少对两个点进行采样。

[En]

That is to say, at least two points must be sampled in one period of the original signal in order to effectively eliminate the problem of frequency aliasing.

1.4 离散傅里叶变换（DFT）

DFT将时域离散且周期的信号的时域变换到频域，分析信号中的频率成分，若是非周期的离散信号需要进行周期延拓再进行DFT。

DFT在时域和频域上都具有离散和周期的特点，可用于计算机处理

; 1.5 DFT的性质

对称性X ( m ) = X ∗ ( N − m ) X(m) =X*(N-m)X (m )=X ∗(N −m )
线性
时移性

2、特征提取流程

Fbank和MFCC提取流程

; 2.1预加重(preemphasis)

提高信号高频部分能量
预加重滤波器是一个一阶高通滤波器，给定时域输入信号x [ n ] x[n]x [n ]，预加重后的信号为y [ n ] = x [ n ] − a ∗ x [ n − 1 ] y[n]=x[n]-a*x[n-1]y [n ]=x [n ]−a ∗x [n −1 ]，其中0.9 < = a < = 1.0 0.9

代码

np.append(signal[0], signal[1:] - coeff * signal[:-1])

2.2加窗分帧（enframe）

语音信号是一种非平稳信号，其统计特性随时间变化；语音信号也具有短时平稳性。在语音识别中，对于句子，识别过程也是基于较小的发音单元(音素、音素或词、字节)，因此使用滑动窗口来提取短时间段。

[En]

The speech signal is a non-stationary signal, and its statistical property changes with time; the speech signal also has the property of short-term stationarity. In speech recognition, for a sentence, the recognition process is also based on smaller pronunciation units (phonemes, phonemes or words, bytes), so the sliding window is used to extract short-time segments.

、帧长、帧移、窗函数，对于采样率为16kHz的信号，帧长、帧移一般为25ms、10ms即400和160个采样点。分帧的过程，在时域上即是用一个窗函数和原始信号进行相乘 y [ n ] = w [ n ] x [ n ] y[n]=w[n]x[n]y [n ]=w [n ]x [n ],w [ n ] w[n]w [n ]为窗函数，常用矩形窗和汉明窗。注在加窗的过程中一般不直接使用矩形窗，实际上是在时域上将信号截断，窗函数与信号在时域相乘，就等于对应的频域表示进行卷积，矩形窗主瓣窄但是旁瓣较大，将其与原信号的频域表示进行卷积就会导致频率泄露。

代码

def enframe(signal, frame_len=frame_len, frame_shift=frame_shift, win=np.hamming(frame_len)):
    """Enframe with Hamming widow function.

        :param signal: The signal be enframed
        :param win: window function, default Hamming
        :returns: the enframed signal, num_frames by frame_len array
"""
    num_samples = signal.size
    // num_frames表示总共有多少个帧
    // 帧长frame_len表示一帧 包含多少个点
    // 帧移frame_shift表示一个帧移 包含多少个点
    num_frames = np.floor((num_samples - frame_len) / frame_shift) + 1
    frames = np.zeros((int(num_frames), frame_len))
    for i in range(int(num_frames)):
        frames[i, :] = signal[i * frame_shift:i * frame_shift + frame_len]
        frames[i, :] = frames[i, :] * win
    return frames

2.3傅里叶变换

经过上一步分帧之后的语音帧，已经从时域变换到了频域， 取DFT系数的模，得到谱特征。（ 语谱图的生成）


def get_spectrum(frames, fft_len=fft_len):
    """Get spectrum using fft
        :param frames: the enframed signal, num_frames by frame_len array
        :param fft_len: FFT length, default 512
        :returns: spectrum, a num_frames by fft_len/2+1 array (real)
"""
    cFFT = np.fft.fft(frames, n=fft_len)
    valid_len = int(fft_len / 2) + 1
    spectrum = np.abs(cFFT[:, 0:valid_len])
    return spectrum

2.4梅尔滤波器组和对数操作

DFT得到了每个频带上信号的能量，但是人耳对频率的感知不是等间隔的，近似于对数函数。将线性频率转换为梅尔频率，梅尔频率和线性频率的转换关系是：m e l = 2595 l o g 10 ( 1 + f / 700 ) mel=2595log10(1+f/700)m e l =2595 l o g 10 (1 +f /700 )

梅尔三角滤波器组：根据起始频率、中间频率和截止频率，确定各滤波系数

梅尔滤波器组设计：

（1）确定滤波器组个数P
（2）根据采样率f s fs f s，DFT点数N，滤波器个数P，在梅尔域上等间隔的产生每个滤波器的起始频率，中间频率和截止频率，注意，上一个滤波器的中间频率为下一个滤波器的起始频率（存在overlap）
（3）将梅尔域上每个三角滤波器的起始、中间和截止频率转换线性频率域，并对DFT之后的谱特征进行滤波，得到P个滤波器组能量，进行log操作，得到FBank特征
MFCC特征在FBank特征的基础上继续进行IDFT变换等操作。

代码

def mel_filter(frame_pow, fs, n_filter, nfft):
"""
    mel 滤波器系数计算
    :param frame_pow: 分帧信号功率谱
    :param fs: 采样率 hz
    :param n_filter: 滤波器个数
    :param nfft: fft点数
    :return: 分帧信号功率谱mel滤波后的值的对数值
    mel = 2595 * log10(1 + f/700)   # 频率到mel值映射
    f = 700 * (10^(m/2595) - 1      # mel值到频率映射
    上述过程本质上是对频率f对数化
"""
    mel_min = 0
    mel_max = 2595 * np.log10(1 + fs / 2.0 / 700)
    mel_points = np.linspace(mel_min, mel_max, n_filter + 2)
    hz_points = 700 * (10 ** (mel_points / 2595.0) - 1)
    filter_edge = np.floor(hz_points * (nfft + 1) / fs)

    fbank = np.zeros((n_filter, int(nfft / 2 + 1)))
    for m in range(1, 1 + n_filter):
        f_left = int(filter_edge[m - 1])
        f_center = int(filter_edge[m])
        f_right = int(filter_edge[m + 1])

        for k in range(f_left, f_center):
            fbank[m - 1, k] = (k - f_left) / (f_center - f_left)
        for k in range(f_center, f_right):
            fbank[m - 1, k] = (f_right - k) / (f_right - f_center)

    filter_banks = np.dot(frame_pow, fbank.T)
    filter_banks = np.where(filter_banks == 0, np.finfo(float).eps, filter_banks)

    filter_banks = 20 * np.log10(filter_banks)
    return filter_banks

2.5动态特征计算

一阶差分（△）△ t = ( c ( t + 1 ) − c ( t − 1 ) ) / 2 △t =( c(t+1) – c(t-1))/2 △t =(c (t +1 )−c (t −1 ))/2 （类比速度）
二阶差分（△△）△△ t = ( △ ( t + 1 ) − △ ( t − 1 ) ) / 2 △△t = (△(t+1) – △(t -1)) /2 △△t =(△(t +1 )−△(t −1 ))/2

2.6能量计算

e = ∑ x 2 [ n ] e = ∑x²[n]e =∑x 2 [n ]
MFCC特征总结：一般常用的MFCC特征是39维，包括12维原始MFCC+12维一阶差分+12维二阶差分—+1维原始能量+一维一阶能量+一维二阶能量
MFCC特征一般用于对角GMM训练，各维度之间相关性小；
FBank特征一般用于DNN训练。

代码


num_ceps = 12
mfcc = dct(filter_banks, type=2, axis=1, norm='ortho')[:, 1:(num_ceps+1)]
plot_spectrogram(mfcc.T, 'MFCC Coefficients')

3、Feature-extraction实践

给定一段音频，请提取12维MFCC特征和23维FBank，阅读代码预加重、分帧、加窗部分，完善作业代码中FBank特征提取和MFCC特征提取部分，并给出最终的FBank特征和MFCC特征，存储在纯文本中，用默认的配置参数，无需进行修改。

3.1代码文件说明

代码依赖
python3
librosa
如果需要观察特征频谱，请确保自己有 matplotlib_依赖并将代码中相关注解解掉
注：不要修改文件默认输出test.fbank test.mfcc的文件名_

3.2文件路径说明

mfcc.py &#x4F5C;&#x4E1A;&#x4EE3;&#x7801;
test.wav &#x6D4B;&#x8BD5;&#x97F3;&#x9891;
Readme.md &#x8BF4;&#x660E;&#x6587;&#x4EF6;

3.3实验完整代码

import librosa
import numpy as np
from scipy.fftpack import dct

import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
def plot_spectrogram(spec, note,file_name):
    """Draw the spectrogram picture
        :param spec: a feature_dim by num_frames array(real)
        :param note: title of the picture
        :param file_name: name of the file
"""
    fig = plt.figure(figsize=(20, 5))
    heatmap = plt.pcolor(spec)
    fig.colorbar(mappable=heatmap)
    plt.xlabel('Time(s)')
    plt.ylabel(note)
    plt.tight_layout()
    plt.savefig(file_name)

alpha = 0.97

frame_len = 400
frame_shift = 160
fft_len = 512

num_filter = 23
num_mfcc = 12

wav, fs = librosa.load('./test.wav', sr=None)

def preemphasis(signal, coeff=alpha):
    """perform preemphasis on the input signal.

        :param signal: The signal to filter.

        :param coeff: The preemphasis coefficient. 0 is no filter, default is 0.97.

        :returns: the filtered signal.

"""
    return np.append(signal[0], signal[1:] - coeff * signal[:-1])

def enframe(signal, frame_len=frame_len, frame_shift=frame_shift, win=np.hamming(frame_len)):
    """Enframe with Hamming widow function.

        :param signal: The signal be enframed
        :param win: window function, default Hamming
        :returns: the enframed signal, num_frames by frame_len array
"""

    num_samples = signal.size
    num_frames = np.floor((num_samples - frame_len) / frame_shift)+1
    frames = np.zeros((int(num_frames),frame_len))
    for i in range(int(num_frames)):
        frames[i,:] = signal[i*frame_shift:i*frame_shift + frame_len]
        frames[i,:] = frames[i,:] * win

    return frames

def get_spectrum(frames, fft_len=fft_len):
    """Get spectrum using fft
        :param frames: the enframed signal, num_frames by frame_len array
        :param fft_len: FFT length, default 512
        :returns: spectrum, a num_frames by fft_len/2+1 array (real)
"""
    cFFT = np.fft.fft(frames, n=fft_len)
    valid_len = int(fft_len / 2 ) + 1
    spectrum = np.abs(cFFT[:,0:valid_len])
    return spectrum

def fbank(spectrum, num_filter = num_filter):
    """Get mel filter bank feature from spectrum
        :param spectrum: a num_frames by fft_len/2+1 array(real)
        :param num_filter: mel filters number, default 23
        :returns: fbank feature, a num_frames by num_filter array
        DON'T FORGET LOG OPRETION AFTER MEL FILTER!

"""
    low_mel_freq = 0
    high_mel_freq = 2595 * np.log10(1+(fs /2)/700)
    mel_filters_points = np.linspace(low_mel_freq,high_mel_freq,num_filter+2)
    freq_filters_pints = (700 * (np.power(10.,(mel_filters_points/2595))-1))
    freq_bin = np.floor(freq_filters_pints / (fs /2)*(fft_len /2 + 1))
    feats=np.zeros((int(fft_len/2+1), num_filter))
    for  m in range(1,num_filter+1):
        bin_low = int(freq_bin[m-1])
        bin_medium = int(freq_bin[m])
        bin_high = int(freq_bin[m+1])
        for k in range(bin_low,bin_medium):
            feats[k,m-1]=(k-freq_bin[m-1])/(freq_bin[m]-freq_bin[m-1])
        for k in range(bin_medium,bin_high):
            feats[k,m-1]=(freq_bin[m+1]-k)/(freq_bin[m+1]-freq_bin[m])
    feats = np.dot(spectrum,feats)
    feats = 20 *np.log10(feats)
    return feats

def mfcc(fbank, num_mfcc = num_mfcc):
    """Get mfcc feature from fbank feature
        :param fbank: a num_frames by  num_filter array(real)
        :param num_mfcc: mfcc number, default 12
        :returns: mfcc feature, a num_frames by num_mfcc array
"""

    mfcc = dct(fbank, type=2, axis=1, norm='ortho')[:, 1:(num_mfcc+1)]
    return mfcc

def write_file(feats, file_name):
    """Write the feature to file
        :param feats: a num_frames by feature_dim array(real)
        :param file_name: name of the file
"""
    f=open(file_name,'w')
    (row,col) = feats.shape
    for i in range(row):
        f.write('[')
        for j in range(col):
            f.write(str(feats[i,j])+' ')
        f.write(']\n')
    f.close()

def main():
    wav, fs = librosa.load('./test.wav', sr=None)
    signal = preemphasis(wav)
    frames = enframe(signal)
    spectrum = get_spectrum(frames)
    fbank_feats = fbank(spectrum)
    mfcc_feats = mfcc(fbank_feats)
    plot_spectrogram(fbank_feats, 'Filter Bank','fbank.png')
    write_file(fbank_feats,'./test.fbank')
    plot_spectrogram(mfcc_feats.T, 'MFCC','mfcc.png')
    write_file(mfcc_feats,'./test.mfcc')

if __name__ == '__main__':
    main()

3.4实验结果

FBank：

MFCC：

Original: https://blog.csdn.net/weixin_51293984/article/details/126500127
Author: 一个很菜的小猪
Title: 【语音识别入门】特征提取（Python完整代码）

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/512803/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

在飞书搞了个机器人，我让ChatGPT帮忙写算法

一、前言环境：系统：Windows 11 64位Python版本：Python 3.9 注：本文不讲怎么实现，只讲实现的效果和一些思考。大家感兴趣再考虑去配置相关机器人。先来问…

人工智能 2023年7月31日
0046
Day32——122.买卖股票的最佳时机II 55. 跳跃游戏 45.跳跃游戏II +第二天复习

赶上进度，冲冲冲一、买卖股票的最佳时机II 二、跳跃游戏解题思路：三、跳跃游戏|| 生命能与世俗相契合，才能不朽，生命的整体是象征的，因为他是有意义的. ——《日瓦戈医生》 …

人工智能 2023年6月30日
0075
倾向得分匹配PSM案例分析

倾向得分匹配(PSM)，是一种模仿RCT随机对照试验随机化分组，提高组间均衡性，进而达到降低混杂因素影响目的一种数据处理策略。PSM在计量研究，临床医学等领域有着广泛的应用。 1….

人工智能 2023年7月28日
0080
数据分析案例（口罩厂亏损）—-明确目的

数据分析前需了解的基础知识 1.jupyter代码编辑器2.pandas库基础知识3.数据分析流程 pandas库 pandas库是一个专门用来解决数据分析问题的库，其有两大优点1…

人工智能 2023年7月8日
0071
【知识图谱系列】基于Randomly Perturb的图谱预训练模型GraphCL

作者：CHEONG公众号：AI机器学习与知识图谱研究方向：自然语言处理与知识图谱本文介绍基于Randomly Perturb互信息最大化的图谱预训练模型GraphCL（NIPS …

人工智能 2023年6月1日
0083
OSError:libcusparse.so.11:cannotopensharedobjectfile:Nosuchfileordirectory

背景：遇到这个问题不要慌，因为我已经解决了，大概率可以帮助到你。同事给我反应遇到了上述问题，我思考了下解决这个问题的思路。解决方案： 1、确认当前nvidia对应的cuda版…

人工智能 2023年7月24日
0070
【python】python multiprocessing多进程处理dataframe，快得飞起~

【python】python multiprocessing多进程处理dataframe，快得飞起~ 首先读入数据，指定变量。 import pandas as pd import…

人工智能 2023年7月6日
00166
基于Jupiter Notebook的回归、分类和聚类可视化分析

资源下载地址：https://download.csdn.net/download/sheziqiong/85738288资源下载地址：https://download.csdn….

人工智能 2023年6月17日
0095
pandas读取与写入一个 workbook 的多个 sheet

背景: win8.1, anaconda 4.12, pandas 1.3.4 目的: 一次读取一个workbook中的多个sheet 一次写入一个workbook中的多个shee…

人工智能 2023年7月7日
0064
机器学习的L1、L2损失函数

** 问题一：损失函数是什么？ ** 损失函数就一个具体的样本而言，模型预测的值与真实值之间的差距。哦哦，这句话怎么理解呢，就是我们使用自己的算法模型，然后就是自己的模型产生了一个…

人工智能 2023年6月15日
0069
手部关键点识别+分类综合项目应用[附代码]

相信大家在很多地方看过Eric.Lee 发布的handpose X项目，Eric.Lee2021 / dpcas · GitCode 但很多人将代码下载下来后，发现里面内容太多，不…

人工智能 2023年7月3日
0089
计算机毕业设计SSM大学体育馆预约系统【附源码数据库】

项目运行环境配置： Jdk1.8 + Tomcat7.0 + Mysql + HBuilderX （Webstorm也行）+ Eclispe（IntelliJ IDEA,Ecli…

人工智能 2023年6月29日
0053
多模态知识图谱

大致方向多模态知识图谱 MMKG: Multi-Modal Knowledge Graphs 链接多模态知识图谱用于实体消岐 Zeroshot Multimodal Named…

人工智能 2023年6月1日
00121
车联网T-BOX小结

T-BOX，telematics box ，远程通信模块，从名字即可看出其核心功能是给车辆赋予联网能力。从TBOX功能的演进过程可以看到两点： 1、TBOX作为量产件产生的最初原…

人工智能 2023年6月2日
0068
机器学习：(PCA)主成分分析法及应用（spss）

目录一、主成分分析原理 1.1、主成分分析法简介 1.2、主成分分析法的意义 1.3、主成分分析法的思想 1.4、主成分分析法的步骤二、运用SPSS进行主成分分析 2.1、导入…

人工智能 2023年6月16日
0072
yolov1代码解读

yolov1论文解读前面已经对yolov1的原理做了一个了解，下面就来看一下yolov1的代码实现过程 yolov1的代码倒是比Faster-Rcnn简单多了，但是一些逻辑顺序和F…

人工智能 2023年5月26日
0045

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

【语音识别入门】特征提取（Python完整代码）

1、数字信号处理基础

1.1数字信号处理基础

1.2频率混叠

1.3 奈奎斯特采样定理

1.4 离散傅里叶变换（DFT）

; 1.5 DFT的性质

2、特征提取流程

; 2.1预加重(preemphasis)

2.2加窗分帧（enframe）

2.3傅里叶变换

2.4梅尔滤波器组和对数操作

梅尔三角滤波器组：根据起始频率、中间频率和截止频率，确定各滤波系数

梅尔滤波器组设计：

2.5动态特征计算

2.6能量计算

3、Feature-extraction实践

3.1代码文件说明

3.2文件路径说明

3.3实验完整代码

3.4实验结果

大家都在看