语音识别器

2023年5月23日下午8:28 • 人工智能 • 阅读 109

语音识别器

环境
*
主要包
环境展示
写作过程
*
1.导入相关包
2.定义类来创建隐马尔科夫模型
3.定义一个函数来解析其中的命令
4.定义主函数
5.运行，这里运行是在终端进行运行的。
注意事项
参考书籍
*
代码参考
基础知识学习
完整代码

环境

&#x8FD9;&#x91CC;&#x6211;&#x7528;&#x7684;linux&#x4E0B;kali&#x865A;&#x62DF;&#x673A;&#x3002;
Anaconda&#x73AF;&#x5883;&#xFF0C;pycharm&#x7F16;&#x8BD1;&#x5668;&#x3002;python3.6.13

主要包

&#x76F4;&#x63A5;&#x5C31;&#x662F;&#x6E05;&#x534E;&#x6E90;&#x4E0B;&#x8F7D;&#xFF0C;&#x6E05;&#x534E;&#x6E90;&#x662F;&#x771F;&#x7684;&#x597D;&#x7528;&#xFF0C;&#x8C01;&#x7528;&#x8C01;&#x77E5;&#x9053;
pip install --user -i https://pypi.tuna.tsinghua.edu.cn/simple numpy
pip install --user -i https://pypi.tuna.tsinghua.edu.cn/simple scipy
pip install --user -i https://pypi.tuna.tsinghua.edu.cn/simple hmmlearn
pip install --user -i https://pypi.tuna.tsinghua.edu.cn/simple python_speech_features

环境展示

这里使用的单词的音频是我们在上一篇文章中通过文本发音保存的音频。

[En]

The audio of the words used here is the audio we saved through text pronunciation in the previous article.

; 写作过程

1.导入相关包


import os
import argparse
import numpy as np
from scipy.io import wavfile
from hmmlearn import hmm
from python_speech_features import mfcc

2.定义类来创建隐马尔科夫模型


class HMMTrainer(object):

    def __init__(self, model_name='GaussianHMM', n_components=4, cov_type='diag', n_iter=1000):

        self.model_name = model_name
        self.n_components = n_components
        self.cov_type = cov_type
        self.n_iter = n_iter
        self.models = []

        if self.model_name == 'GaussianHMM':
            self.model = hmm.GaussianHMM(n_components=self.n_components,
                                         covariance_type=self.cov_type, n_iter=self.n_iter)
        else:
            raise TypeError('Invalid model type')

    def train(self, X):
        np.seterr(all='ignore')
        self.models.append(self.model.fit(X))

    def get_score(self, input_data):
        return self.model.score(input_data)

3.定义一个函数来解析其中的命令


def build_arg_parse():
    parser = argparse.ArgumentParser(description='Trains the HMM classifier')
    parser.add_argument("--input-folder", dest="input_folder", required=True,
                        help="Input folder containing the audio files insubfolders")
    return parser

4.定义主函数


if __name__ == '__main__':
    args = build_arg_parse().parse_args()
    input_folder = args.input_folder

    hmm_models = []

    for dirname in os.listdir(input_folder):

        subfolder = os.path.join(input_folder, dirname)

        if not os.path.isdir(subfolder):
            continue

        label = subfolder[subfolder.rfind('/') + 1:]

        X = np.array([])
        y_words = []

"""
        for x in os.listdir(subfolder):
            if x.endswith('.wav'):
                print(x)
"""
        for filename in [x for x in os.listdir(subfolder) if x.endswith('.wav')]:
            print(filename)

            filepath = os.path.join(subfolder, filename)

            sampling_freq, audio = wavfile.read(filepath)
            print(sampling_freq)

            mfcc_features = mfcc(audio, sampling_freq)

            if len(X) == 0:
                X = mfcc_features
            else:
                X = np.append(X, mfcc_features, axis=0)

            y_words.append(label)

        hmm_trainer = HMMTrainer()
        hmm_trainer.train(X)
        hmm_models.append((hmm_trainer, label))
        hmm_trainer = None

    input_files = ['audio_files/hello/hello.wav',
                   'audio_files/linux/linux.wav',
                   'audio_files/python/python.wav',
                   'audio_files/windows/windows.wav',
                   'audio_files/你好/你好.wav',
                   'audio_files/place/place.wav',
                   'audio_files/variables/variables.wav']

    for input_file in input_files:

        sampling_freq, audio = wavfile.read(input_file)

        mfcc_features = mfcc(audio, sampling_freq)

        max_score = 0
        output_label = None

        for item in hmm_models:
            hmm_model, label = item

            score = hmm_model.get_score(mfcc_features)
            if score > max_score:
                max_score = score
                output_label = label

        print('\nTrue:', input_file[input_file.find('/') + 1:input_file.rfind('/')])
        print('Predicted:', output_label)

5.运行，这里运行是在终端进行运行的。

python speech_recognizer.py --input-folder audio_files

注意事项

上面运行的代码有问题：

[En]

There is a problem with the above code running:

这个报错不影响运行，但影响语音识别的准确率。
解决：网上查找资料说是帧率的问题。需要 把22kHz帧率改为16kHz.
原回答：https://github.com/mozilla/DeepSpeech/issues/1888

根据提示，我尝试输出帧率，发现确实是22kHz帧率。

然后是帧速率更改：有两个方面需要更改：

[En]

Then there is the frame rate change: there are two areas to change:

改变帧率后，语音识别率也有了很大的提高。

[En]

After changing the frame rate, the speech recognition rate is also improved a lot.

; 参考书籍

代码参考

&#x672C;&#x6587;&#x4EE3;&#x7801;&#x57FA;&#x672C;&#x4EFF;&#x7167;&#x300A;Python &#x673A;&#x5668;&#x5B66;&#x4E60;&#x7ECF;&#x5178;&#x5B9E;&#x4F8B;&#x300B;&#x5B8C;&#x6210;&#x3002;
    &#x82F1;&#x6587;&#x540D;&#xFF1A;&#x300A;Python Machine Learning Cookbook&#x300B;

基础知识学习

&#x6DF1;&#x5EA6;&#x5B66;&#x4E60;&#x57FA;&#x7840;&#x77E5;&#x8BC6;&#x6211;&#x53C2;&#x8003;&#x7684;&#x662F;&#xFF1A;
    &#x300A;Python&#x6DF1;&#x5EA6;&#x5B66;&#x4E60;-&#x57FA;&#x4E8E;TensorFlow&#x300B;

完整代码


import os
import argparse
import numpy as np
from scipy.io import wavfile
from hmmlearn import hmm
from python_speech_features import mfcc

class HMMTrainer(object):

    def __init__(self, model_name='GaussianHMM', n_components=4, cov_type='diag', n_iter=1000):

        self.model_name = model_name
        self.n_components = n_components
        self.cov_type = cov_type
        self.n_iter = n_iter
        self.models = []

        if self.model_name == 'GaussianHMM':
            self.model = hmm.GaussianHMM(n_components=self.n_components,
                                         covariance_type=self.cov_type, n_iter=self.n_iter)
        else:
            raise TypeError('Invalid model type')

    def train(self, X):
        np.seterr(all='ignore')
        self.models.append(self.model.fit(X))

    def get_score(self, input_data):
        return self.model.score(input_data)

def build_arg_parse():
    parser = argparse.ArgumentParser(description='Trains the HMM classifier')
    parser.add_argument("--input-folder", dest="input_folder", required=True,
                        help="Input folder containing the audio files insubfolders")
    return parser

if __name__ == '__main__':
    args = build_arg_parse().parse_args()
    input_folder = args.input_folder

    hmm_models = []

    for dirname in os.listdir(input_folder):

        subfolder = os.path.join(input_folder, dirname)

        if not os.path.isdir(subfolder):
            continue

        label = subfolder[subfolder.rfind('/') + 1:]

        X = np.array([])
        y_words = []

"""
        for x in os.listdir(subfolder):
            if x.endswith('.wav'):
                print(x)
"""
        for filename in [x for x in os.listdir(subfolder) if x.endswith('.wav')]:
            print(filename)

            filepath = os.path.join(subfolder, filename)

            sampling_freq, audio = wavfile.read(filepath)
            print(sampling_freq)

            mfcc_features = mfcc(audio, 16000)

            if len(X) == 0:
                X = mfcc_features
            else:
                X = np.append(X, mfcc_features, axis=0)

            y_words.append(label)

        hmm_trainer = HMMTrainer()
        hmm_trainer.train(X)
        hmm_models.append((hmm_trainer, label))
        hmm_trainer = None

    input_files = ['audio_files/hello/hello.wav',
                   'audio_files/linux/linux.wav',
                   'audio_files/python/python.wav',
                   'audio_files/windows/windows.wav',
                   'audio_files/你好/你好.wav',
                   'audio_files/place/place.wav',
                   'audio_files/variables/variables.wav']

    for input_file in input_files:

        sampling_freq, audio = wavfile.read(input_file)

        mfcc_features = mfcc(audio, 16000)

        max_score = 0
        output_label = None

        for item in hmm_models:
            hmm_model, label = item

            score = hmm_model.get_score(mfcc_features)
            if score > max_score:
                max_score = score
                output_label = label

        print('\nTrue:', input_file[input_file.find('/') + 1:input_file.rfind('/')])
        print('Predicted:', output_label)

Original: https://blog.csdn.net/qq_45071353/article/details/123724616
Author: 初冬的早晨
Title: 语音识别器

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/497982/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

【读书笔记】Verilog的亚稳态现象和跨时钟域处理方法

书※目：FPGA深度解析_第七章_樊继明著高级FPGA设计结构、实现和优化_第六章_孟宪元译文章目录一、亚稳态（1）跨时钟域的亚稳态现象（2）亚稳态的多径传输二、跨时钟域…

人工智能 2023年5月30日
00114
keras深度学习之猫狗分类一

在计算机视觉图像分类中，在实践过程中会经常遇到样本数量很少的情况。很少的样本可能是几百张图像或者几万张图像，在小样本上如何训练一个泛化能力强的模型是一个很值得探讨的问题，中间有不少…

人工智能 2023年7月1日
00139
手把手教你配置CUDA, cuDNN-十分钟配置本地CUDA, cuDNN

前文书说到，类似tensorflow和pytorch这种框架有一个很大的优点就是提供了对GPU的支持，那么，如果我们的电脑上正好有一块十分强劲的显卡就请跟着下面的教程配置本地的CU…

人工智能 2023年6月24日
00199
业务模型设计

啊哦~你想找的内容离你而去了哦内容不存在，可能为如下原因导致： ① 内容还在审核中 ② 内容以前存在，但是由于不符合新的规定而被删除 ③ 内容地址错误 ④ 作者删除了内容。可…

人工智能 2023年7月31日
0079
如何使用YOLOv5训练自己的数据集

1.如何用yolov5已有的权重进行检测在detcet.py中进行参数设置weights中选用yolov5s.pt/yolov5m.pt/yolov5l.pt/yolov5x.p…

人工智能 2023年5月28日
00146
Pandas 速查手册

关键缩写和包导入在这个速查手册中，我们使用如下缩写： df：任意的 Pandas Data…

人工智能 2023年7月7日
0085
神经网络算法介绍

人工神经网络（Artificial Neural Networks，ANN）最早起源于1943年，受”脑神经元学说”的启发，心理学家W·Mcculloch和…

人工智能 2023年7月14日
0072
Pytorch安装教程

Pytorch安装教程（使用版本CUDA == 10.2）前言 1.安装Anaconda 2.查看CUDA版本 3.安装Pytorch 4.测试是否安装成功前言在测试中一直提…

人工智能 2023年7月10日
00133
图像处理-梯度

参考：（1）Matlab中gradient函数（梯度计算原理）_夏菠-CSDN博客_matlab求梯度函数（2）数值梯度 – MATLAB g…

人工智能 2023年6月22日
00123
【(强推)李宏毅2021/2022春机器学习课程】2022-语音与影像上的神奇自监督学习模型【精】

文章目录 Review：Self-supervised Learning for Text Self-supervised Learning for Speech Self-sup…

人工智能 2023年5月25日
0092
异常检测第二篇：一分类SVM（OneClassSVM）

大家好，我是菜菜卷。在上篇中，我们简要介绍了使用孤立森林（ isolate forest）来进行无监督异常检测，在今天的内容中，我们将使用 ocsvm(One Class SV…

人工智能 2023年6月30日
0098
【CV】第 3 章：使用 PyTorch 构建深度神经网络

🔎大家好，我是Sonhhxg_柒，希望你看完之后，能对你有所帮助，不足请指正！共同学习交流🔎📝个人主页－Sonhhxg_柒的博客_CSDN博客📃🎁欢迎各位→点赞👍 + 收藏⭐️ +…

人工智能 2023年6月16日
00124
哈工大信息安全实验 XSS跨站脚本攻击原理与实践

XX大学XX学院《网络攻击与防御》实验报告实验报告撰写要求实验操作是教学过程中理论联系实际的重要环节，而实验报告的撰写又是知识系统化的吸收和升华过程，因此，实验报告应该体现完整…

人工智能 2023年6月4日
00155
OpenCV52:OpenCV中的Kmeans聚类

目标了解如何在OpenCV中使用cv2.kmeans()函数进行数据聚类理解参数输入参数 sample：它应该是 np.float32数据类型，并且每个特征都应该放在单个列中…

人工智能 2023年5月31日
00107
双目视觉定位方案设计

双目视觉定位总体方案设计主要步骤说明： 1）双目相机标定，获取左右摄像头内参、外参，得到图像坐标到世界坐标的映射模型。 2）图像预处理，根据标定得到畸变参数对采集到的图像去畸变，…

人工智能 2023年5月28日
00107
NLP论文收集

EMNLP2015: Long Short-Term Memory Neural Networks for Chinese Word Segmentation ACL 2017: …

人工智能 2023年5月28日
0079

2024 年 6 月
一	二	三	四	五	六	日
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

语音识别器

语音识别器

主要包

1.导入相关包

2.定义类来创建隐马尔科夫模型

3.定义一个函数来解析其中的命令

4.定义主函数

5.运行，这里运行是在终端进行运行的。

代码参考

基础知识学习

大家都在看