语音数据增强算法汇总（附代码）

2023年5月25日下午9:28 • 人工智能 • 阅读 114

A summary of speech data augment algorithms
语音数据增强算法汇总
本项目已上传至Github：speech_data_augment

1.音量增强

1.1 volume_augment：

在百度DeepSpeech2源码的基础上改进：保持增益前后的数据类型不变

音量增益范围约为【0.316，3.16】，不均匀采样：指数分布，降低幂函数的底10可以缩小范围

使用幂函数可以有更大的概率使增益后的音频接近原始音频

def volume_augment(samples, min_gain_dBFS=-10, max_gain_dBFS=10):
"""
    音量增益范围约为【0.316，3.16】，不均匀，指数分布，降低幂函数的底10.可以缩小范围
    :param samples: 音频数据，一维
    :param min_gain_dBFS:
    :param max_gain_dBFS:
    :return:
"""
    samples = samples.copy()
    data_type = samples[0].dtype
    gain = rng.uniform(min_gain_dBFS, max_gain_dBFS)
    gain = 10. ** (gain / 20.)
    samples = samples * gain

    samples = samples.astype(data_type)
    return samples

1.2 效果展示

gain = 2.4986395586019112

上图/左图为原始数据，下图/右图为增益后数据。下同

1.2.1 波形图

原始音频

音量扰动（2.49倍）波形

; 1.2.2 语谱图

原始特征
音量扰动后特征

1.3 结论

波形图上，振幅变化明显
特征图上，整体颜色深浅发生细微变化，变化不明显
当振幅增益过大时，会出现破音，在波形图上表现为超出振幅范围，在特征图上表现为特征明显的突变。可以模型音频中可能出现的破音现象。

2. 速度增强

2.1 speed_numpy

使用numpy线形插值法

def speed_numpy(samples, min_speed=0.9, max_speed=1.1):
"""
    线形插值速度增益
    :param samples: 音频数据，一维
    :param max_speed: 不能低于0.9，太低效果不好
    :param min_speed: 不能高于1.1，太高效果不好
    :return:
"""
    samples = samples.copy()
    data_type = samples[0].dtype
    speed = rng.uniform(min_speed, max_speed)
    old_length = samples.shape[0]
    new_length = int(old_length / speed)
    old_indices = np.arange(old_length)
    new_indices = np.linspace(start=0, stop=old_length, num=new_length)
    samples = np.interp(new_indices, old_indices, samples)
    samples = samples.astype(data_type)
    return samples

2.2 librosa

def speed_librosa(samples, min_speed=0.9, max_speed=1.1):
"""
    librosa时间拉伸
    :param samples: 音频数据，一维
    :param max_speed: 不要低于0.9，太低效果不好
    :param min_speed: 不要高于1.1，太高效果不好
    :return:
"""
    samples = samples.copy()
    data_type = samples[0].dtype

    speed = rng.uniform(min_speed, max_speed)
    samples = samples.astype(np.float)
    samples = librosa.effects.time_stretch(samples, speed)
    samples = samples.astype(data_type)
    return samples

2.3 效果展示

speed = 2.0，为方便展示设置了较大的速度，实际项目中不应设置过快或过慢

2.3.1 波形图

原始波形

DeepSpeech

librosa

; 2.3.2 语谱图

原始波形
DeepSpeech
librosa

2.4 总结

听觉效果
除语速变化以外，DeepSpeech方法的音调也会发生相应变化。加速时，音调升高，减速时，音调降低。速度变化在【0.9，1.1】范围内时，听觉良好，超过这个范围时，声音（主要是音调）不自然。
librosa音调不会发生变化，但是声音不清晰
DeepSpeech方法变换后波形图变化很小，声音特征（语谱图）依然明显，在听觉上清晰，但音调同时发生变化。
librosa方法变换后波形图变化大，声音特征（语谱图）模糊，在听觉上不清晰。

3. 音调增强

3.1 librosa

def pitch_librosa(samples, sr=16000, ratio=5):
    samples = samples.copy()
    data_type = samples[0].dtype
    samples = samples.astype('float')
    ratio = random.uniform(-ratio, ratio)
    samples = librosa.effects.pitch_shift(samples, sr, n_steps=ratio)
    samples = samples.astype(data_type)
    return samples

3.2 效果展示

ratio = 5

3.2.1 波形图

; 3.2.2 语谱图

原始特征
音调调整后特征

3.3 结论

听觉效果上，音调发生变化，时长保持不变。
波形图可以明显看到频率增加（升调）
特征图中，声音特征变的模糊（librosa处理音频的通病？是否有利于训练有待测试，可以把应用比例调小）

4. 移动增强

4.1 time_shift

改进:

为在一定比例范围内随机偏移，而不是使用固定的时间偏移
循环移动，而不是空隙零填充

def time_shift(samples, max_ratio=0.05):
"""
    改进:
    1.为在一定比例范围内随机偏移，不再需要时间
    2.循环移动
    :param samples: 音频数据
    :param max_ratio:
    :return:
"""
    samples = samples.copy()
    frame_num = samples.shape[0]
    max_shifts = frame_num * max_ratio
    shifts_num = np.random.randint(-max_shifts, max_shifts)
    print(shifts_num)
    if shifts_num > 0:

        temp = samples[:shifts_num]
        samples[:-shifts_num] = samples[shifts_num:]

        samples[-shifts_num:] = temp
    elif shifts_num < 0:

        temp = samples[shifts_num:]
        samples[-shifts_num:] = samples[:shifts_num]

        samples[:-shifts_num] = temp
    return samples

4.2 numpy

def time_shift_numpy(samples, max_ratio=0.05):
"""
    时间变化是在时间轴的±5％范围内的随机滚动。环绕式转换以保留所有信息。
    Shift a spectrogram along the frequency axis in the spectral-domain at random
    :param max_ratio:
    :param samples: 音频数据，一维(序列长度,) 或 特征数据(序列长度,特征维度)
    :return:
"""
    samples = samples.copy()
    data_type = samples[0].dtype
    frame_num = samples.shape[0]
    max_shifts = frame_num * max_ratio
    nb_shifts = np.random.randint(-max_shifts, max_shifts)
    samples = np.roll(samples, nb_shifts, axis=0)
    samples = samples.astype(data_type)
    return samples

4.3 效果展示

偏移量5%

4.3.1 波形图

原始波形
移动后波形

; 4.3.2 特征图

原始特征
移动后特征

4.4 总结

改进百度的方法后，两个方法的移动效果相同，仅实现方式不同，因此可以只测试运行速度
移动时应只移动空白段，而不应该移动语音段，移动语音段会丢失或扰乱语音的顺序性，因此应设置很小的移动帧数
此移动干扰对语音段的特征不会造成任何影响
理论上，移动干扰对语音识别效果提升很小

5. 噪声增强

5.1 自然噪声

需要大量噪声音频文件

优势：可以覆盖更多的场景，如公园、人声、电流声等
缺点：需要大量噪声数据，数据不足会影响泛化能力

def noise_augmentation(samples, noise_list, max_db=0.5):
"""
    叠加自然噪声
    :param samples: 语音采样
    :param noise_list:噪声文件列表
    :param max_db:最大噪声增益
    :return:
"""
    samples = samples.copy()
    data_type = samples[0].dtype
    noise_path = np.random.choice(noise_list)

    db = np.random.uniform(low=0.1, high=max_db)
    aug_noise, fs = read_wave_from_file(noise_path)

    while len(aug_noise)  len(samples):
        aug_noise = np.concatenate((aug_noise, aug_noise), axis=0)

    diff_len = len(aug_noise) - len(samples)
    start = np.random.randint(0, diff_len)
    end = start + len(samples)

    samples = samples + db * aug_noise[start:end]
    samples = samples.astype(data_type)
    return samples

5.2 人工噪声

优点：可随机生成，不需要大规模数据集
缺点：在人工噪声上表现良好的方法，在现实世界的噪声数据集上效果可能并不理想

5.2.1 高斯白噪声

白噪声是随机样本按一定的间隔分布，均值为0，标准差为1。

def gaussian_white_noise_numpy(samples, min_db=10, max_db=500):
"""
    高斯白噪声
    噪声音量db
        db = 10, 听不见
        db = 100,可以听见，很小
        db = 500,大
        人声都很清晰
    :param samples:
    :param max_db:
    :param min_db:
    :return:
"""
    samples = samples.copy()
    data_type = samples[0].dtype
    db = np.random.randint(low=min_db, high=max_db)
    noise = db * np.random.normal(0, 1, len(samples))
    print(db)
    samples = samples + noise
    samples = samples.astype(data_type)
    return samples

5.2.2 均匀白噪声

def uniform_white_noise_numpy(samples, min_db=10, max_db=500):
"""
    &#x5747;&#x5300;&#x767D;&#x566A;&#x58F0;
    :param samples:
    :param max_db:
    :param min_db:
    :return:
"""
    samples = samples.copy()  # frombuffer()&#x5BFC;&#x81F4;&#x6570;&#x636E;&#x4E0D;&#x53EF;&#x66F4;&#x6539;&#x56E0;&#x6B64;&#x4F7F;&#x7528;&#x62F7;&#x8D1D;
    data_type = samples[0].dtype
    db = np.random.randint(low=min_db, high=max_db)
    noise = np.random.uniform(low=-db, high=db, size=len(samples))  # &#x9AD8;&#x65AF;&#x5206;&#x5E03;
    print(db)
    samples = samples + noise
    samples = samples.astype(data_type)
    return samples

5.3 效果展示

自然噪声：db = 0.5

合成噪声db = 500，实际使用时不要太大。

5.3.1 波形图

原始波形
自然噪声
自然噪声叠加
高斯白噪声
均匀白噪声

; 5.3.2 语谱图

原始特征
自然噪声
自然噪声叠加
高斯白噪声
均匀白噪声

5.4 结论

人工噪声和自然噪声可以混合使用

6. 时域遮掩

6.1 Google

def time_mask_augment(inputs, max_mask_time=5, mask_num=10):
"""
    时间遮掩，
    :param inputs: 三维numpy或tensor，(batch, time_step,  feature_dim)
    :param max_mask_time:
    :param mask_num:
    :return:
"""
    time_len = inputs.shape[1]
    for i in range(mask_num):
        t = np.random.uniform(low=0.0, high=max_mask_time)
        t = int(t)
        t0 = random.randint(0, time_len - t)
        inputs[:, t0:t0 + t, :] = 0

    return inputs

6.2 效果展示

6.2.1 特征图

原始特征
时域掩码

; 7. 频域遮掩

7.1 Google

def frequency_mask_augment(inputs, max_mask_frequency=5, mask_num=10):
"""

    :param inputs: 三维numpy或tensor，(batch, time_step,  feature_dim)
    :param max_mask_frequency:
    :param mask_num:
    :return:
"""
    feature_len = inputs.shape[2]
    for i in range(mask_num):
        f = np.random.uniform(low=0.0, high=max_mask_frequency)
        f = int(f)
        f0 = random.randint(0, feature_len - f)
        inputs[:, :, f0:f0 + f] = 0
    return inputs

7.2 效果展示

7.2.1 特征图

原始特征
频率遮掩

; 参考资料

[1]DeepSpeech2（音量扰动、速度扰动、移动扰动、在线贝叶斯归一化、加噪、脉冲响应
 [2]Github\CSDN——声音数据增强（时间、音调）
[3]CSDN——音频数据增强处理（时间、音调、随机高斯噪声）
[4]CSDN——Python音频的数据扩充,你知道怎么用吗？（裁减、旋转、调音、加噪）
[5]Github——pydub（压缩、均衡器EQ、变速、正弦、方波、锯齿、白噪声等、 静音检测）、CSDN——pydub的中文文档（含API）
[6]博客园——音频数据增强及python实现 – 凌逆战（加噪、波形位移、波形拉伸、音高修正）
[7]知乎——音频信号中做数据增强及部分代码实现（叠加、加噪、时移、音高、其他资料）
[8]利用python进行音频数据增强（加噪、时移、变速、音高）

Original: https://blog.csdn.net/qq_36999834/article/details/109851965
Author: 绝版小哥
Title: 语音数据增强算法汇总（附代码）

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/516137/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

pandas

目录 1，两种数据结构 1.1 Series 1.1.1 定义 1.1.2 创建 1.1.3 基本用法 1.2 DataFrame 1.2.1 定义 1.2.2 创建 1.2.3 …

人工智能 2023年7月8日
0060
基于TensorFlow的深度学习（5）

文章目录 1 维度变换 * 1.1 tf.reshape() 1.2 tf.transpose() 一个小实例 1.3 tf.expand_dims() 1.4 tf.squeez…

人工智能 2023年5月25日
0097
Note:Anti-noise FCM image segmentation method based on quadratic polynomial

一、key-word FCM、quadratic polynomial 1.1 FCM 模糊C均值(Fuzzy C-means)算法简称FCM “模糊”-&…

人工智能 2023年6月2日
0092
SVM支持向量机原理详解

支持向量机SVM(Support Vector Machine) 1.解决问题思路展开要解决的问题：什么样的决策边界才是最好的？特征数据本身如果就很难分，该怎么办？*计算负责度怎…

人工智能 2023年6月19日
0097
工作汇报11/9

参考论文：知识图谱构建技术综述-秦志光等人概述语义网络是一张数据构成的网络，语义网络技术向用户提供的是一个查询环境，其核心要义是以图形的方式向用户返回经过加工和推理的知识。而知…

人工智能 2023年6月1日
0082
【阅读笔记】Cross-lingual Knowledge Graph Alignment via Graph Convolutional Networks（基于图卷积网络的跨语言知识图对齐）

啊哦~你想找的内容离你而去了哦内容不存在，可能为如下原因导致： ① 内容还在审核中 ② 内容以前存在，但是由于不符合新的规定而被删除 ③ 内容地址错误 ④ 作者删除了内容。可…

人工智能 2023年6月1日
0092
计算机视觉 – 图像增强应用实践 (基础篇）C++ OpenCV

环境配置我之前是跟着B站的一个UP主弄的：VS2019-Opencv4.5.2安装教程（win11上安装跟win10系统安装没有任何区别）_哔哩哔哩_bilibili （但是不知道…

人工智能 2023年5月26日
00106
哈工大2022机器学习实验二：逻辑回归

本实验要求利用逻辑回归（Logistic Regression），对生成的数据进行二分类。首先我们先回顾一下逻辑回归的基本原理：逻辑回归逻辑回归，又意译为对率回归（周志华《机器…

人工智能 2023年7月17日
0050
一文看懂MECOOL KD2 Android TV Dongle

MECOOL KD2 是由Amlogic S905Y4 处理器驱动的Android TV Dongle ，在最新的AndroidTV11 操作系统上运行。将棒插入任何带有HDMI …

人工智能 2023年5月25日
0083
专题导读：基于大数据的知识图谱及其应用

点击上方蓝字关注我们随着大数据时代的到来，知识工程迎来了新的发展机遇。特别是在谷歌公司2012年公布了知识图谱（knowledge graph）项目用于增强其搜索引擎的性能之后，…

人工智能 2023年6月1日
00117
深度学习神经网络入门案例详细解析-鸢尾花案例

神经网络设计过程案例：鸢尾花分类鸢尾花三种类别：三种: 狗尾巴杂草小腹肌收集花朵的特征值：四种花萼长花萼宽花瓣长花瓣宽以及：三种输出结果狗尾巴杂草小腹肌操作方…

人工智能 2023年6月25日
00101
通过读取数据源，生成一个excel多个sheet的可视化数据报表的过程。（项目2）

import pandas as pd from openpyxl.drawing.image import Image import matplotlib.pyplot as p…

人工智能 2023年7月6日
0075
SpringBoot 读写分离（配Mysql5.7）笔记

目录需求环境数据库表字段操作笔记MyBatis自动生成mapper、xml的配置读写分离切换的代码实现入口主类代码创建WebAPI实现读主库、读从库、写主库验证读写分离拓展读写分…

人工智能 2023年6月28日
0055
使用 CNN 进行图像分类 – 理解计算机视觉

介绍在计算机视觉中，我们有一个卷积神经网络，它非常适用于计算机视觉任务，例如图像分类、对象检测、图像分割等等。图像分类是当今时代最需要的技术之一，它被用于医疗保健、商业等…

人工智能 2023年7月13日
0068
YOLOv5输入端（一）—— Mosaic数据增强|CSDN创作打卡

入门小菜鸟，希望像做笔记记录自己学的东西，也希望能帮助到同样入门的人，更希望大佬们帮忙纠错啦~侵权立删。一、原理分析二、代码分析 1、主体部分——load_mosaic 2、l…

人工智能 2023年6月24日
0068
基于逻辑规则的图谱推理

导读：近年来，知识图谱在众多行业场景被大量应用，例如推荐、医疗。为了构造尽可能完备的图谱，知识图谱的推理工作也成为学术届和工业界的一个重要研究课题。来自Mila人工智能实验室的瞿…

人工智能 2023年6月1日
0059

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31