# 语音识别原理与应用 第三章 语音特征提取

3.1预处理

3.2 短时傅立叶变换

3.3听觉特性

3.4线性预测

3.5倒谱分析

3.6常用的声学特征

[En]

The original speech is a time series signal of variable length, which is not suitable to be directly used as the input of traditional machine learning algorithms. It generally needs to be converted into a specific feature vector representation. This process is called * speech feature extraction * .

[En]

With the development of deep learning, the original signal can also be directly used as the input of the network, but because of its large redundancy in time domain, it will increase the difficulty of training. Feature extraction is still one of the key links of speech signal processing technology.

## 3.1预处理

[En]

Firstly, the original speech signal in time domain is preprocessed, including * pre-emphasis * , * framing * and * windowing * .

（1） 预加重

（2） 分帧

（3） 加窗

[En]

The above framing method is equivalent to adding a rectangular window to the speech signal, truncating the signal in the time domain, the passband in the corresponding frequency domain is narrow, and there are multiple sidelobe at the boundary, which leads to serious spectrum leakage.

，如下所示：

## 3.2 短时傅立叶变换

[En]

The signal of each frequency can be represented by sine wave and modeled by sine function. Based on the Euler formula, the sine function can be corresponding to a unified exponential form.

[En]

The sine function has orthogonality, that is, the product of any two sine waves of different frequencies, the integral in the common period of the two is equal to zero. The orthogonality is expressed by complex exponential operation as follows:

，如果

[En]

Based on the orthogonality of sinusoidal function, sinusoidal signals corresponding to different frequencies can be separated from speech signals by correlation processing.

DFT系数通常是复数形式，因为

[En]

In speech signal processing, we mainly pay attention to the spectrum amplitude of the signal, also known as the amplitude spectrum, which is expressed as follows:

[En]

The energy spectrum is expressed by the square of the amplitude spectrum:

[En]

Most of the sounds produced by various sound sources are compound sounds composed of sounds of different intensities and frequencies.

。根据复数的奇、偶、虚、实关系，采用快速傅里叶变换（FFT），可简化计算复杂度，在的时间内计算出DFT。

## 3.3听觉特性

[En]

Human beings have different perceptions of speech at different frequencies:

1kHz以下，与频率呈线性关系。

1kHz以上，与频率成对数关系。

## 3.5倒谱分析

（1）傅里叶变换。将时域的卷积信号转化为频域的乘积信号：

（2）对数运算。将乘积信号转变为加性信号：

（3）傅里叶反变换。得到时域的语音信号倒谱。

## 3.6常用的声学特征

Original: https://blog.csdn.net/hnlg311709000526/article/details/120952729
Author: 楚歌汉水
Title: 语音识别原理与应用 第三章 语音特征提取

