Python调用百度API进行语音识别

目录

1.作者介绍

高志祥,男,西安工程大学电子信息学院,2021级

[En]

Gao Zhixiang, male, School of Electronic Information, Xi’an Engineering University, Grade 2021

研究方向:机器视觉与人工智能

[En]

Research interests: machine vision and artificial intelligence

电子邮件:562076173@qq.com

刘帅波,男,西安工程大学电子信息学院,2021级,张宏伟人工智能课题组

[En]

Liu Shuaibo, male, School of Electronic Information, Xi’an Engineering University, Grade 2021, Zhang Hongwei artificial Intelligence Research Group

研究方向:机器视觉与人工智能

[En]

Research interests: machine vision and artificial intelligence

电子邮件:1461004501@qq.com

2.基于百度API的普通话识别

2.1语音识别

语音识别就是将一段语音信号转换成相应的文本信息。该系统主要包括四个部分:特征提取、声学模型、语言模型、词典和解码。此外,为了更有效地提取特征,往往需要对采集到的声音信号进行滤波、成帧等音频数据的预处理,从原始信号中恰当地提取出需要分析的音频信号。

[En]

Speech recognition is to convert a section of speech signal into corresponding text information. the system mainly includes four parts: feature extraction, acoustic model, language model, dictionary and decoding. In addition, in order to extract features more effectively, it is often necessary to filter, frame and other audio data preprocessing of the collected sound signal, and properly extract the audio signal that needs to be analyzed from the original signal.

一般流程:

Python调用百度API进行语音识别

; 2.2百度API调用方法

通过在百度智能开发平台中建立语音技术等应用,获得相对的技术权威功能。

[En]

Through the establishment of voice technology and other applications in the Baidu intelligent development platform, we will obtain the relative technical authority function.

Python调用百度API进行语音识别
创建完毕后百度会给你一个应用列表,使用这里的AppID,API Key及Secret Key便可以进行API的调用。

3.实验

3.1实验准备

本次实验我们采用的是百度API进行识别,故需要安装baidu-aip模块
首先打开命令行,在里面输入pip install baidu-aip。

Python调用百度API进行语音识别
如上图,即是安装成功。
因为本项目采用pyqt5进行了界面编写,故还需要安装pyqt5模块。
打开命令行,在里面输入pip install pyqt5即可安装。
接下来需要去百度AI的官网去创建应用,获取AppID,APIKey,Secret Key。

; 3.2实验结果

Python调用百度API进行语音识别
在此就可直接输入对应的数字,enter键后便开始录音,随即弹出百度搜索界面,可直接进行搜索,即实验成功!

4.实验代码

import wave
import requests
import time
import base64
from pyaudio import PyAudio, paInt16
import webbrowser

framerate = 16000
num_samples = 2000
channels = 1
sampwidth = 2
FILEPATH = 'speech.wav'

base_url = "https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id=%s&client_secret=%s"
APIKey = "********"
SecretKey = "**********"

HOST = base_url % (APIKey, SecretKey)

def getToken(host):
    res = requests.post(host)
    return res.json()['access_token']

def save_wave_file(filepath, data):
    wf = wave.open(filepath, 'wb')
    wf.setnchannels(channels)
    wf.setsampwidth(sampwidth)
    wf.setframerate(framerate)
    wf.writeframes(b''.join(data))
    wf.close()

def my_record():
    pa = PyAudio()
    stream = pa.open(format=paInt16, channels=channels,
                     rate=framerate, input=True, frames_per_buffer=num_samples)
    my_buf = []

    t = time.time()
    print('正在录音...')

    while time.time() < t + 4:
        string_audio_data = stream.read(num_samples)
        my_buf.append(string_audio_data)
    print('录音结束.')
    save_wave_file(FILEPATH, my_buf)
    stream.close()

def get_audio(file):
    with open(file, 'rb') as f:
        data = f.read()
    return data

def speech2text(speech_data, token, dev_pid=1537):
    FORMAT = 'wav'
    RATE = '16000'
    CHANNEL = 1
    CUID = '*******'
    SPEECH = base64.b64encode(speech_data).decode('utf-8')

    data = {
        'format': FORMAT,
        'rate': RATE,
        'channel': CHANNEL,
        'cuid': CUID,
        'len': len(speech_data),
        'speech': SPEECH,
        'token': token,
        'dev_pid': dev_pid
    }
    url = 'https://vop.baidu.com/server_api'
    headers = {'Content-Type': 'application/json'}

    print('正在识别...')
    r = requests.post(url, json=data, headers=headers)
    Result = r.json()
    if 'result' in Result:
        return Result['result'][0]
    else:
        return Result

def openbrowser(text):
    maps = {
        '百度': ['百度', 'baidu'],
        '腾讯': ['腾讯', 'tengxun'],
        '网易': ['网易', 'wangyi']

    }
    if text in maps['百度']:
        webbrowser.open_new_tab('https://www.baidu.com')
    elif text in maps['腾讯']:
        webbrowser.open_new_tab('https://www.qq.com')
    elif text in maps['网易']:
        webbrowser.open_new_tab('https://www.163.com/')
    else:
        webbrowser.open_new_tab('https://www.baidu.com/s?wd=%s' % text)

if __name__ == '__main__':
    flag = 'y'
    while flag.lower() == 'y':
        print('请输入数字选择语言:')
        devpid = input('1536:普通话(简单英文),1537:普通话(有标点),1737:英语,1637:粤语,1837:四川话\n')
        my_record()
        TOKEN = getToken(HOST)
        speech = get_audio(FILEPATH)
        result = speech2text(speech, TOKEN, int(devpid))
        print(result)
        if type(result) == str:
            openbrowser(result.strip(','))
        flag = input('Continue?(y/n):')

Original: https://blog.csdn.net/m0_37758063/article/details/123645822
Author: ZHW_AI课题组
Title: Python调用百度API进行语音识别

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/527185/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球