【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

注:这是一个机器学习的实用项目(有数据+代码+文档+视频讲解)。如果您需要数据+代码+文档+视频解说,您可以在文章末尾直接获取。

[En]

Note: this is a machine learning practical project (with * data + code + documentation + video explanation * ). If you need * data + code + documentation + video explanation * , you can obtain it directly at the end of the article.

【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

1.项目背景

随着语音识别技术的发展,作为人机交互的重要接口,它改变了我们生活的方方面面,从智能家居语音控制系统到车载语音识别系统,语音识别给我们带来了很多便利。在大数据、云计算等概念被提出并投入商业使用后,经过几十年语音识别领域的研究,虽然已经有了很多非常成功的商业产品,但语音识别系统在实际应用中的性能与人类的听觉能力还有很大的差距。正是这些差距和市场对高效语音识别系统的需求吸引了许多研究人员在这一领域进行研究,其中许多已经取得了可喜的成果。

[En]

With the development of speech recognition, as an important interface of human-computer interaction, it has changed our lives in many aspects, from smart home speech control system to vehicle speech recognition system, speech recognition has brought us a lot of convenience. After big data, cloud computing and other concepts were put forward and put into commercial use, after decades of research in the field of speech recognition, although there have been many very successful commercial products, however, there is still a big gap between the performance of speech recognition system in practical application and human auditory ability. It is these gaps and the market demand for efficient speech recognition systems that attract many researchers to do research in this field, and many of them have achieved gratifying results.

语音识别是人机交互的理想中介工具,是推动机器智能化发展的重要技术。然而,由于传统语音识别中的某些理论假设,语音识别的应用场景变得越来越复杂,这导致了许多语音识别系统性能提升的一定瓶颈。针对语音识别的技术障碍,需要引入一些新的理论和方法来解决。深度学习是大数据特征提取、分类和识别的重要理论,对提高语音识别系统的性能具有重要意义。

[En]

Speech recognition is an ideal intermediary tool for human-computer interaction and an important technology to promote the development of machines to be more intelligent. However, due to certain theoretical assumptions in traditional speech recognition, the application scenarios of speech recognition are becoming more and more complicated, which leads to certain bottlenecks in the improvement of the performance of many speech recognition systems. in view of the technical obstacles of speech recognition, some new theories and new methods need to be introduced to solve them. Deep learning is an important theory of feature extraction, classification and recognition for big data, and it is of great significance to improve the performance of speech recognition system.

2.数据获取

本次建模数据来源于网络(本项目撰写人整理而成),数据项统计如下:

【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

数据详情如下(部分展示):

【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

3.数据预处理

3.1 用Pandas工具查看数据

使用Pandas工具的shape属性、head()方法查看前五行数据:

【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

从上图可以看到,总共有8个数据项,8732个音频文件。

关键代码:

【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

3.2查看音频类型

使用Pandas工具的groupby()方法查看音频的类型:

【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

从上图可以看到,总共有10种类型。

关键代码:

【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

4.探索性数据分析

4.1波形可视化

用librosa工具的load()方法加载音频文件,通过waveplot()方法进行绘制波形图:以1.wav音频为例。

【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

4.2 图谱可视化

用librosa工具的load()方法加载音频文件,通过specshow()和colorbar()方法进行绘制图谱:以1.wav音频为例。

【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

5.特征工程

5.1 提取音频特征,并准备建模数据

X(提取的音频信号值)为特征数据,y(音频类型)为标签数据。关键代码如下:

【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

5.2数据集拆分

数据集分为训练集、验证集和测试集。首先将所有数据集划分为90%的训练集和10%的测试集,然后将90%的训练集划分为80%的训练集和20%的验证集。密钥代码如下:

[En]

The data set is split into training set, verification set and test set. First split all data sets into 90% training set and 10% test set, and then split 90% training set into 80% training set and 20% verification set. The key code is as follows:

【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

可以看出,训练集中的样本数量为6286条数据,验证集为1572条数据,测试集为874条数据。

[En]

It can be seen that the number of samples in the training set is 6286 pieces of data, the verification set is 1572 pieces of data, and the test set has 874 pieces of data.

6.构建神经网络分类模型

主要使用ANN算法,用于目标分类。

6.1模型参数

【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

关键代码如下:

【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

从上图可以看到,该模型中有1411160个参数。

[En]

As you can see from the above figure, there are 1411160 parameters in this model.

7.模型评估

7.1评估指标及结果

评估指标主要包括准确率、F1分值等等。

【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

从上表可以看出,准确率为89% F1分值为88%,ANN分类模型良好,效果不错。

7.2 损失曲线图

【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

从结果可以看出,训练集和验证集的损失基本是逐渐减少的,验证集的损失在16次左右开始增加,所以16次拟合基本达到了验证集的最小损失。

[En]

As can be seen from the results, the loss of training set and verification set basically decreased gradually, and the loss of verification set began to increase after about 16 times, so the 16 times fitting basically reached the minimum loss of verification set.

关键代码:

【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

7.3 准确率曲线图

【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

从上图可以看出,训练集和验证集的准确率逐渐提高,验证集的准确率已经达到89%。

[En]

From the above picture, we can see that the accuracy of training set and verification set has gradually increased, and the accuracy of verification set has reached 89%.

7.4 混淆矩阵

ANN分类模型混淆矩阵:

【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

从上图可以看出,第一音频类型的精确数是97,第二音频类型是37,第三音频类型是66,依此类推。

[En]

As can be seen from the above picture, the accurate number of the first audio type is 97, the second audio type is 37, the third audio type is 66, and so on.

7.5 分类报告

ANN分类模型分类报告:

【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

从上图可以看到,分类类型为0的F1分值为0.95;分类类型为1的F1分值为0.89;分类类型为2的F1分值为0.78;整个模型的准确率为89%.

7.6 模型预测

预测104327-2-0-26.wav这个音频的类型,关键代码如下:

【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

预测结果如下:

【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

真实值为:

【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

8.结论与展望

作为人工智能领域最热门的研究方向,深度学习被广泛应用于语音、图像、文本识别等领域,并取得了惊人的成绩。语音识别作为未来人机界面的主要界面,直接影响到智能系统的用户体验。两种技术的有机结合,一方面语音识别系统采集的大量训练数据有助于训练健壮性更强、泛化能力更强的深度网络,另一方面更好更强的深度网络可以有效提高语音识别系统的识别精度,降低噪声对语音识别系统的影响。

[En]

As the hottest research of artificial intelligence, deep learning is widely used in voice, image and text recognition and has achieved amazing results. As the main interface of man-machine interface in the future, speech recognition has a direct impact on the user experience of intelligent systems. With the organic combination of the two technologies, on the one hand, a large amount of training data collected by the speech recognition system is helpful to train the deep network with stronger robustness and stronger generalization ability, on the other hand, the better and stronger depth network can effectively improve the recognition accuracy of the speech recognition system and reduce the influence of noise on the speech recognition system.

综上所述,本文采用了ANN人工神经网络分类模型,最终证明了我们提出的模型效果良好。准确率达到了89%,可用于日常生活中进行建模预测,以提高价值。

Original: https://blog.csdn.net/weixin_42163563/article/details/122133705
Author: 胖哥真不错
Title: 【项目实战】Python基于librosa和人工神经网络实现语音识别分类模型(ANN算法)项目实战

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/498006/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球