深度学习声纹识别_一种基于机器学习及深度学习的声纹降噪方法及系统与流程…

本发明属于语音字符识别和声纹降噪领域,涉及一种基于机器学习和深度学习的声纹降噪方法及系统。

[En]

The invention belongs to the field of speech character recognition and voiceprint noise reduction, and relates to a voiceprint noise reduction method and system based on machine learning and deep learning.

背景技术:

地学数据采集过程中的智能化是地学大数据建设的基础环节,在地质生产的实际过程中占有重要地位。其中,地质学家在野外观察和描述地质对象的调查数据的收集是重要的一部分。为了支持地质人员在野外快速采集观测到的地学数据,以往的地学数据采集系统非常重视数据录入的效率和便利性,但传统的文本键盘录入方式效率低,不能在野外操作。为了提高数据录入的效率,采用语音输入和字符识别系统来提高数据采集的效率。在使用中发现,在野外使用录音系统时,可能会有风声、雨声、动物声等,也可能在钻井现场、采矿现场或油气生产环境中存在大量的作业机械噪音。当这些噪声与人声混合时,当前语音识别系统的文本识别准确率将大大降低。因此,在野外地学数据采集中,语音输入和字符识别的准确率很低。由于这一问题,目前的地学数据采集系统在特殊的地质工作环境中运行效率低,可用性差。

[En]

The intelligence in the process of geoscience data collection is a basic link in the construction of geoscience big data, which occupies an important position in the actual process of geological production. Among them, the collection of survey data that geologists observe and describe geological objects in the field is an important part. In order to support geologists to quickly collect the observed geoscience data in the field, the previous geoscience data acquisition systems attach great importance to the efficiency and convenience of data input, but the traditional text keyboard input mode is inefficient and inoperable in the field. In order to improve the efficiency of data input, voice input and character recognition system are adopted to improve the efficiency of data acquisition. In use, it is found that when using the voice recording system in the field, there may be wind sound, rain sound, animal sound, etc., or there may be a large number of operating machine noise in the drilling site, mining site or oil and gas production environment. when these noises are mixed with human voices, the text recognition accuracy of the current speech recognition system will be greatly reduced. As a result, the accuracy of speech input and character recognition in field geoscience data acquisition is very low. Because of this problem, the current geoscience data acquisition system has low operation efficiency and poor availability in the special geological working environment.

技术实现要素:

鉴于上述不足,迫切需要一种语音输入降噪技术,在野外地质工作环境中尽可能多地去除环境噪声,为以后提高语音文本识别的准确率奠定基础。本发明提供了一种基于机器学习和深度学习的声纹降噪方法,用于解决野外语音模式采集地学数据时背景噪声大、有效语音难以准确识别的技术问题,包括以下步骤:

[En]

In view of the above shortcomings, there is an urgent need for a speech input noise reduction technology to remove as much environmental noise as possible in the field geological work environment, so as to lay a foundation for improving the accuracy of later speech text recognition. The invention provides a voiceprint noise reduction method based on machine learning and deep learning, which is used to solve the technical problem that the background noise is large and the effective speech is difficult to accurately identify when collecting geoscience data in the field voice mode, which includes the following steps:

S1、获取特定人在野外实地工作环境中对地质现象和地质认识描述的说话音频;

S2、将步骤S1获取的所述说话音频通过机器学习及深度学习模型进行学习,对说话音频中夹杂的环境音进行识别区分;

S3、将步骤S2中得到的识别过的语音进行过滤,剔除掉该语音中不属于所述特定人说话音频的环境音,得到经过初步筛查的语音;

S4、判断初步筛查的语音信噪比是否达到预设的阈值,若否,则跳转步骤S3,若是,则继续步骤S5;

S5、将步骤S4所述语音与该人声纹识别模型进行对比提取,保留与该人声纹识别模型相符合的语音频率及语谱图像,剔除掉与该人声纹识别模型不符合的语音,得到声纹降噪处理的语音;

S6、判断声纹降噪处理后的语音中的人声纹纯净度是否达到预设的阈值,若否,则跳转步骤S5,若是,则继续步骤S7;

S7、对步骤S6所述声纹降噪后的语音进行增强;

S8、判断经过增强的语音清晰度是否达到预设的阈值,若否,则跳转步骤S7,若是则继续S9;

S9、将步骤S8中获得的结果语音输出到语音文字识别系统,完成后续处理。

在本发明基于机器学习及深度学习的声纹降噪方法中,在步骤S1之前还包括建立机器学习及深度学习模型,具体为,获取大量人说话音频的语音,将语音全部都转换成为波谱图并导入到计算机中,对机器学习及深度学习模型进行大量反复训练,机器学习及深度学习识别区分人说话音频中夹杂的环境音,从而获得训练好的机器学习及深度学习模型。

在本发明基于机器学习及深度学习的声纹降噪方法中,在步骤S5之前还包括建立特定说话人的人声纹识别模型,具体为用已有的该特定说话人的声纹建立声纹语谱图,进行声纹语谱图上的特征提取后便可以建立起只属于该人的声纹识别模型,经过大量该人声纹数据的反复训练,得到一个声纹识别率高的该人的声纹识别模型。

根据本发明的另一方面,为了解决其技术问题,本发明提供了一种基于机器学习和深度学习的声纹降噪系统,包括以下模块:

[En]

According to another aspect of the invention, in order to solve its technical problems, the invention provides a voiceprint noise reduction system based on machine learning and deep learning, which comprises the following modules:

初始化模块,用于获取特定人员在野外工作环境中对地质现象和地质知识的描述的音频。

[En]

Initialization module, which is used to obtain the audio of a specific person’s description of geological phenomena and geological knowledge in the field working environment.

机器学习和深度学习处理识别模块用于分别通过机器学习和深度学习模型学习初始化模块中获取的语音音频,以识别和区分语音音频中混合的环境声音。

[En]

The machine learning and deep learning processing recognition module is used to learn the speech audio acquired in the initialization module through machine learning and deep learning model respectively to identify and distinguish the ambient sounds mixed in the speech audio.

语音过滤模块,用于对机器学习和深度学习处理识别模块得到的识别语音进行过滤,去除语音中不属于特定人语音音频的环境音,得到初步筛选后的语音。

[En]

The speech filtering module is used to filter the recognized speech obtained by the machine learning and deep learning processing recognition module, to remove the ambient sound in the speech that does not belong to the specific person’s speech audio, and to obtain the speech that has been preliminarily screened.

语音过滤判断模块,用于判断过滤后的语音信噪比是否达到预设阈值,若未达到,则跳转至语音过滤模块,若是,则继续声纹提取模块

[En]

The speech filtering judgment module is used to determine whether the filtered speech signal-to-noise ratio reaches the preset threshold; if not, jump to the voice filtering module; if so, continue with the voiceprint extraction module

声纹对比度提取模块用于将过滤判断模块中得到的语音与此人的声纹识别模型进行比较提取,并保留与此人声纹识别模型一致的语音频率和语谱图像。去除不符合声纹识别模型的语音,得到声纹去噪处理后的语音。

[En]

The voiceprint contrast extraction module is used to compare and extract the speech obtained in the filter judgment module with the person’s voiceprint recognition model, and to retain the speech frequency and speech spectrum images consistent with the person’s voiceprint recognition model. the speech that does not accord with the voiceprint recognition model is removed, and the speech processed by voiceprint noise reduction is obtained.

语音纯度判断模块:用于判断提取出的人体声纹纯度是否达到预设阈值,若否,跳转至声纹提取模块,若是,则进行语音增强模块

[En]

Speech purity judgment module: used to judge whether the extracted human voiceprint purity reaches the preset threshold, if not, jump to the voiceprint extraction module, if yes, carry on the speech enhancement module

语音增强模块进一步增强声纹提取模块中得到的降噪后的语音。

[En]

The speech enhancement module further enhances the voice after noise reduction obtained in the voiceprint extraction module.

语音清晰度判断模块,用于判断提取的人体声纹纯度是否达到预设阈值,若否,则跳转至语音增强模块,若是,则继续语音输入模块

[En]

The speech articulation judgment module is used to determine whether the purity of the extracted human voiceprint reaches the preset threshold, if not, jump to the speech enhancement module, if yes, continue the voice input module

语音输出模块将语音增强模块得到的结果输出给语音字符识别系统,完成后续处理。

[En]

The speech output module outputs the results obtained in the speech enhancement module to the speech character recognition system to complete the subsequent processing.

在基于机器学习和深度学习的声纹降噪系统中,本发明还包括:

[En]

In the voiceprint noise reduction system based on machine learning and deep learning, the invention also includes before initializing the module:

建立了机器学习和深度学习模型,具体地说,获取在嘈杂的野外环境中大量说话的人的语音,并将所有语音转换为语谱图并导入计算机。对机器学习和深度学习模型进行了大量的重复训练,并利用机器学习和深度学习来区分人类语音音频中的环境声音,从而得到训练有素的机器学习和深度学习模型。

[En]

A machine learning and deep learning model is established, specifically, the speech of a large number of people speaking in a noisy wild environment is obtained, and all the speech is converted into a spectrogram and imported into the computer. a large number of repeated training of machine learning and deep learning models are carried out, and machine learning and deep learning are used to distinguish the ambient sounds in human speech audio, so as to obtain a trained machine learning and deep learning model.

在基于机器学习和深度学习的声纹降噪系统中,在声纹提取模块之前建立特定说话人的声纹识别模型,具体是利用特定说话人已有的声纹建立声纹谱图,对声纹谱图进行特征提取后,建立属于该人的声纹识别模型,并对该人的大量声纹数据进行重复训练。得到了具有较高声纹识别率的声纹识别模型。

[En]

In the voiceprint noise reduction system based on machine learning and deep learning, the voiceprint recognition model of a specific speaker is established before the voiceprint extraction module, specifically, the voiceprint spectrogram is established with the existing voiceprint of the specific speaker, and after the feature extraction on the voiceprint spectrogram, the voiceprint recognition model that only belongs to that person can be established, and after repeated training of a large number of voiceprint data of that person. A voice print recognition model with high voiceprint recognition rate is obtained.

本发明采用了一种基于机器学习和深度学习的声纹降噪方法及系统,针对特定多种不同高低信噪比的复杂野外地质环境中,利用普通降噪技术或简单的语音降噪算法降噪后降噪后降噪率低的问题,为后期地学数据采集的语音特征识别过程提供了良好的前提。为降低地学数据采集过程中语音字符识别的难度,大大提高地学数据采集过程中语音字符识别的准确性提供了技术保障。

[En]

The invention adopts a voiceprint noise reduction method and system based on machine learning and deep learning, which can be targeted to solve the problem of low noise reduction rate after noise reduction by using common noise reduction technology or simple speech noise reduction algorithm in a specific variety of complex field geological environments with different high and low signal-to-noise ratios, so as to provide a good prerequisite for the speech character recognition process of geoscience data acquisition in the later stage. It provides technical guarantee for reducing the difficulty of speech character recognition in the process of geoscience data acquisition and greatly improving the accuracy of speech character recognition in the process of geoscience data acquisition.

附图说明

本发明将结合下面的附图和实施例在附图中进一步说明:

[En]

The invention will be further explained in combination with the drawings and embodiments below, in the drawings:

图1为本发明实施例GMM-UBM建立说话人确认系统声纹建模流程图;

图2为本发明实施例MFCC特征向量提取流程图;

图3为本发明实施例声纹识别模型与录入语音对比流程图;

图4为本发明实施例基于机器学习及深度学习的声纹降噪方法流程图。

具体实施方式

为了使本发明的目的、技术方案和优点更加明确,结合附图和实例对本发明作了进一步的详细说明。

[En]

In order to make the purpose, technical scheme and advantages of the invention more clear, the invention is further described in detail in combination with drawings and examples.

完整流程图参见图1。首先,获取机器学习及深度学习模型,主要分为两个步骤。第一步,建立模型,利用大量获得的野外地学环境下采集的自然环境音以及大量的特定人说话的声音语段建立一个机器学习及深度学习模型;第二步,训练模型,将自然环境及特定人说话音频全部都转换成为波谱图的形式并导入到计算机中,通过大量反复训练,机器学习及深度学习区分环境音和特定人说话的语音波谱图。

其次,建立声纹识别模型。每个人独具一格的声纹可以用语谱图观察出来。获取特定说话人的声音声纹,将该人的声纹先进行特征提取操作,用已有的该特定说话人的声纹建立声纹语谱图,进行声纹语谱图上的特征提取后便可以建立起只属于该人的声纹识别模型。声纹建模方法分为三种类型,分别为:文本相关、文本无关(GMM-UBM、GMM-SVM、GMM-UBM-LFA、i-vector/PLDA)和文本提示。因为不能决定输入的语音内容,因此选择文本无关类型进行声纹建模,从而得到该人的声纹识别模型。本实施例选取GMM-UBM建立说话人确认系统声纹建模,流程图见图2,输入多个说话人声音和测试语音,通过MFCC特征向量提取,经过大量人声纹数据的反复训练和MAP自适应处理及确认决策,得到一个声纹识别率较高的该人声纹识别模型。其中MFCC特征向量提取过程见图3,具体为输入样本音频,给样本音频预加重、分帧、加窗,将处理好的样本音频做傅里叶变换,进行Mel频率滤波,进行Log对数能量,对样本求倒谱,输出MFCC图像。

在现场,语音录音系统通过机器学习和深度学习模型,将现场工作环境中特定人员描述地质现象和地质知识的语音音频片段转换为语谱图。针对引入模型的语音,识别语音中的环境杂音,剔除环境杂音,即去除语音片段中不属于人声的环境声音。反复循环处理,判断去噪后的语音是否合格,具体预先设置信噪比,当达到预设的信噪比时,进行下一步,如果不合格,则继续进行噪声滤波。

[En]

In the field, the speech recording system is used to convert the speech audio clips that specific people describe geological phenomena and geological knowledge in the field working environment into spectrograms through machine learning and deep learning model. aiming at the speech introduced into the model, the environmental murmur in the speech is recognized, and the environmental murmur is eliminated, that is, the ambient sound that does not belong to human voice in the speech segment is removed. Repeated cyclic processing to determine whether the noise filtered speech is qualified, specifically setting the signal-to-noise ratio in advance, when the preset signal-to-noise ratio is reached, then proceed to the next step, and if not, continue to carry out noise filtering.

将过滤后的语音与建立的声纹识别模型进行比较,流程图如图4所示。将输入的声纹与声纹识别模型进行比较,保留了与模型一致的语音频率和语谱图像。对不符合模型的语音进行多次剔除和处理,以确定提取的声纹语音是否纯净,具体为预设的声纹纯净度阈值。当语音纯度达到预设阈值时,分析语音谱图上是否存在除人声以外的噪声,如果有,则继续声纹降噪过程,如果没有,则对语音进行声纹降噪处理。

[En]

The filtered voice is compared with the established voiceprint recognition model, and the flow chart is shown in figure 4. The input voiceprint is compared with the voiceprint recognition model, and the speech frequency and speech spectrum image consistent with the model are retained. The speech that does not conform to the model is eliminated and processed many times to determine whether the voiceprint extracted speech is pure or not, specifically as the preset voiceprint purity threshold. When the speech purity reaches the preset threshold, analyze whether there is any noise other than the person’s voice on the speech spectrogram, if so, continue the voiceprint noise reduction process, if not, the voice is processed by the voiceprint noise reduction.

最后,对得到的降噪处理后的语音利用语音增强算法,例如:LMS自适应滤波器、LMS自适应限波器和维纳滤波法等,使已经得到的较为纯净的语音中的特定人说话的语音声音进一步增强与清晰化,判断得到的语音是否清晰,判定条件具体为:如果该段语音的语谱图波伏很小,就将其放大;如果该段语音的语谱图有些地方有重叠,就根据算法分离出一个平衡点,使其在该点清晰化。在该平衡点处分贝和清晰度都能保证声音不会太小,又保证语音不会失真。当得到的语音满足判定条件之后,终止循环,将获得的结果语音输出到语音文字识别系统,完成后续文字识别及存储处理。

根据本发明实施例,还包括以下模块:

[En]

According to the embodiment of the invention, the following modules are also included:

初始化模块用于获取野外工作环境中特定人员对地质现象和地质知识的描述的语音音频。

[En]

The initialization module is used to obtain the speech audio of the description of geological phenomena and geological knowledge of a specific person in the field working environment.

机器学习和深度学习处理识别模块用于分别通过机器学习和深度学习模型学习初始化模块中获取的语音音频,以识别和区分语音音频中混合的环境声音。

[En]

The machine learning and deep learning processing recognition module is used to learn the speech audio acquired in the initialization module through machine learning and deep learning model respectively to identify and distinguish the ambient sounds mixed in the speech audio.

语音过滤模块,用于对机器学习和深度学习处理识别模块得到的识别语音进行过滤,去除语音中不属于人类语音音频的环境音,得到初步筛选后的语音。

[En]

The speech filtering module is used to filter the recognized speech obtained by the machine learning and deep learning processing recognition module, remove the ambient sound in the speech that does not belong to the human speech audio, and get the speech that has been screened preliminarily.

语音过滤判断模块,用于判断过滤后的语音信噪比是否达到预设阈值,若未达到,则跳转至语音过滤模块,若是,则继续声纹提取模块

[En]

The speech filtering judgment module is used to determine whether the filtered speech signal-to-noise ratio reaches the preset threshold; if not, jump to the voice filtering module; if so, continue with the voiceprint extraction module

声纹对比度提取模块用于将过滤判断模块中得到的语音与此人的声纹识别模型进行比较提取,并保留与此人声纹识别模型一致的语音频率和语谱图像。去除不符合声纹识别模型的语音,得到声纹去噪处理后的语音。

[En]

The voiceprint contrast extraction module is used to compare and extract the speech obtained in the filter judgment module with the person’s voiceprint recognition model, and to retain the speech frequency and speech spectrum images consistent with the person’s voiceprint recognition model. the speech that does not accord with the voiceprint recognition model is removed, and the speech processed by voiceprint noise reduction is obtained.

语音纯度判断模块:用于判断提取出的人体声纹纯度是否达到预设阈值,若否,跳转至声纹提取模块,若是,则进行语音增强模块

[En]

Speech purity judgment module: used to judge whether the extracted human voiceprint purity reaches the preset threshold, if not, jump to the voiceprint extraction module, if yes, carry on the speech enhancement module

语音增强模块进一步增强声纹提取模块中得到的降噪后的语音。

[En]

The speech enhancement module further enhances the voice after noise reduction obtained in the voiceprint extraction module.

语音清晰度判断模块,用于判断提取的人体声纹纯度是否达到预设阈值,若否,则跳转至语音增强模块,若是,则继续语音输入模块

[En]

The speech articulation judgment module is used to determine whether the purity of the extracted human voiceprint reaches the preset threshold, if not, jump to the speech enhancement module, if yes, continue the voice input module

语音输出模块将语音增强模块得到的结果输出给语音字符识别系统,完成后续处理。

[En]

The speech output module outputs the results obtained in the speech enhancement module to the speech character recognition system to complete the subsequent processing.

在基于机器学习和深度学习的声纹降噪系统中,本发明还包括:

[En]

In the voiceprint noise reduction system based on machine learning and deep learning, the invention also includes before initializing the module:

建立机器学习和深度学习模型,具体地说,获取大量人类语音音频的语音,将所有语音转换成语谱图并输入计算机,反复训练机器学习和深度学习模型。通过机器学习和深度学习识别来区分人类语音音频中的环境声音,从而得到训练有素的机器学习和深度学习模型。

[En]

To establish a machine learning and deep learning model, specifically, to obtain the voice of a large number of human speech audio, convert all the speech into spectrograms and import them into the computer, and train the machine learning and deep learning models repeatedly. Machine learning and deep learning recognition to distinguish the ambient sound in human speech audio, so as to obtain a trained machine learning and deep learning model.

在基于机器学习和深度学习的声纹降噪系统中,特定说话人的声纹识别模型是在声纹提取模块之前建立的,具体是利用特定说话人已有的声纹建立声纹谱图,在对声纹谱图进行特征提取后,可以建立只属于该人的声纹识别模型,通过对大量的声纹数据进行重复训练,可以建立只属于该人的声纹识别模型。得到了具有较高声纹识别率的声纹识别模型。

[En]

In the voiceprint noise reduction system based on machine learning and deep learning, the voiceprint recognition model of a specific speaker is established before the voiceprint extraction module, specifically, the voiceprint spectrogram is established with the existing voiceprint of the specific speaker, and after the feature extraction on the voiceprint spectrogram, the voiceprint recognition model that only belongs to that person can be established, and the voiceprint recognition model that only belongs to that person can be established after repeated training of a large number of voiceprint data. A voice print recognition model with high voiceprint recognition rate is obtained.

以上,结合附图描述了本发明的实施例,但本发明不限于上述具体实施例,其仅是示意性的,而不是限制性的。在本发明的启发下,该领域的普通技术人员也可以在不脱离本发明的目的和权利要求所保护的范围的情况下进行多项改进和变形。这些都在本发明的保护范围内。

[En]

Above, the embodiments of the invention are described in combination with the drawings, but the invention is not limited to the above-mentioned specific embodiments, which are only schematic rather than restrictive. under the inspiration of the invention, ordinary technicians in the field can also make a number of improvements and deformations without departing from the scope protected by the purpose and claims of the invention. These are within the protection of the present invention.

Original: https://blog.csdn.net/weixin_36382999/article/details/111954906
Author: 戮萌
Title: 深度学习声纹识别_一种基于机器学习及深度学习的声纹降噪方法及系统与流程…

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/526242/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球