用树莓派做一个语音机器人

早就想写一篇语音机器人的文章,凑巧这两天受委托做个树莓派语音机器人,又复习一下流程熟悉了过程才准备写一篇文章,这是基于图灵机器人和百度api的语音助手。

目录

准备

硬件准备

首先,我们需要为覆盆子馅饼安装麦克风和扬声器。当然,我们可以不使用扬声器直接使用耳机,然后对其进行调试。

[En]

First of all, we need to install microphones and speakers for the raspberry pie. Of course, we can use headphones directly without speakers, and then debug them.

输入:

lsusb

用树莓派做一个语音机器人
或者
arecord -l

用树莓派做一个语音机器人
识别成功之后进行录音
arecord -D "plughw:1,0" -f dat -c 1 -r 16000 -d 5 test.wav

如果发现录音杂音很大的话可以尝试使用alsamixer进行调音

用树莓派做一个语音机器人

包的准备

我们通常需要下载的包如下

[En]

The packages we usually need to download are as follows

pip3 install baidu-aip
pip3 install requests

准备机器人

请注意,在我们都开始之前,我们必须加入。

[En]

Note that we must join before we all begin.

录音

录制通常非常简单,只需使用以下代码即可

[En]

Recording is usually very simple, just use the following code

import os
os.system('sudo arecord -D "plughw:1,0" -f S16_LE -r 16000 -d 4 ' + path)

但是我可以在第一次使用的时候使用它,但之后它就会出现。

[En]

But I can use it for the first time when I use it, but then it will appear.

arecord main:828的错误
这个错误我找了许多方法都没有解决于是我就换了个录音方法,比较麻烦,看个人需求,这里需要下载一个pyaudio包

pip3 install pyaudio

sudo apt-get install portaudio19-dev
pip3 install pyaudio
def SoundRecording(path):
    import pyaudio
    import wave
    import os
    import sys
    CHUNK = 512
    FORMAT = pyaudio.paInt16
    CHANNELS = 1
    RATE = 16000
    RECORD_SECONDS = 5
    WAVE_OUTPUT_FILENAME = path
    p = pyaudio.PyAudio()
    stream = p.open(format=FORMAT,
                    channels=CHANNELS,
                    rate=RATE,
                    input=True,
                    frames_per_buffer=CHUNK)
    print("recording...")
    frames = []
    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
        data = stream.read(CHUNK)
        frames.append(data)
    print("done")
    stream.stop_stream()
    stream.close()
    p.terminate()
    wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(p.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))
    wf.close()

语音转文字

这个就比较简单了,我们直接调用百度api就可以,我们先去百度AI的控制台申请个应用找到ID,AK,SK,然后获取access_token

APP_ID = '22894511'
API_KEY = 'En7e3iR8dHO1F7Hx3Fy7M0vd'
SECRET_KEY = 'c1591BrrbodXP5zQuBcQSNim8xcL6ZiE'
client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)

host = f'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=6KLdtAifYT46PtyzULAGpIzu&client_secret=tCEEz7LC4XfD2RA4ojgdOUvBBd7i3T4Y'
access_token = requests.get(host).json()["access_token"]
def SpeechRecognition(path):
        with open(path, 'rb') as fp:
            voices = fp.read()
        try:

            result = client.asr(voices, 'wav', 16000, {'dev_pid': 1537, })

            result_text = result["result"][0]
            print("you said: " + result_text)
            return result_text
        except KeyError:
            print("KeyError")

图灵机器人回复

这里我们只需要将转成的文本内容发送给图灵机器人就可以了,这时我们也需要申请一个图灵机器人账号才可以,又图灵的AK


turing_api_key = "自己的AK"
api_url = "http://openapi.tuling123.com/openapi/api/v2"
headers = {'Content-Type': 'application/json;charset=UTF-8'}

def TuLing(text_words=""):
    req = {
        "reqType": 0,
        "perception": {
            "inputText": {
                "text": text_words
            },
            "selfInfo": {
                "location": {
                    "city": "天津",
                    "province": "天津",
                    "street": "天津科技大学"
                }
            }
        },
        "userInfo": {
            "apiKey": turing_api_key,
            "userId": "Leosaf"
        }
    }

    req["perception"]["inputText"]["text"] = text_words
    response = requests.request("post", api_url, json=req, headers=headers)
    response_dict = json.loads(response.text)

    result = response_dict["results"][0]["values"]["text"]
    print("AI Robot said: " + result)

    return result

文字转语音

返回值是文本,我们肯定会希望将其转换为语音,所以它会很有趣。

[En]

The returned value is text, and we will certainly want it to be converted into voice, so it will be fun.

host = f'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=6KLdtAifYT46PtyzULAGpIzu&client_secret=tCEEz7LC4XfD2RA4ojgdOUvBBd7i3T4Y'
access_token = requests.get(host).json()["access_token"]
def SpeechSynthesis(text_words=""):
    result = client.synthesis(text_words, 'zh', 1, {'per': 4, 'vol': 10, 'pit': 9, 'spd': 5})
    if not isinstance(result, dict):
        with open('app.mp3', 'wb') as f:
            f.write(result)
    os.system('mpg321 app.mp3')

完整代码

这里的代码我是用的pyaudio,不需要可以自行修改


import json
import os
import requests
from aip import AipSpeech

BaiDu_APP_ID = "22894511"
API_KEY = "En7e3iR8dHO1F7Hx3Fy7M0vd"
SECRET_KEY = "c1591BrrbodXP5zQuBcQSNim8xcL6ZiE"
client = AipSpeech(BaiDu_APP_ID, API_KEY, SECRET_KEY)

turing_api_key = '67d5386150e248fea4af3db80f4ca1ae'
api_url = 'http://openapi.tuling123.com/openapi/api/v2'
headers = {'Content-Type': 'application/json;charset=UTF-8'}

host = f'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=6KLdtAifYT46PtyzULAGpIzu&client_secret=tCEEz7LC4XfD2RA4ojgdOUvBBd7i3T4Y'
access_token = requests.get(host).json()["access_token"]
running = True
resultText, path = "", "output.wav"

def SoundRecording(path):
    import pyaudio
    import wave
    import os
    import sys
    CHUNK = 512
    FORMAT = pyaudio.paInt16
    CHANNELS = 1
    RATE = 16000
    RECORD_SECONDS = 5
    WAVE_OUTPUT_FILENAME = path
    p = pyaudio.PyAudio()
    stream = p.open(format=FORMAT,
                    channels=CHANNELS,
                    rate=RATE,
                    input=True,
                    frames_per_buffer=CHUNK)
    print("recording...")
    frames = []
    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
        data = stream.read(CHUNK)
        frames.append(data)
    print("done")
    stream.stop_stream()
    stream.close()
    p.terminate()
    wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(p.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))
    wf.close()

def SpeechRecognition(path):
        with open(path, 'rb') as fp:
            voices = fp.read()
        try:

            result = client.asr(voices, 'wav', 16000, {'dev_pid': 1537, })

            result_text = result["result"][0]
            print("you said: " + result_text)
            return result_text
        except KeyError:
            print("KeyError")

def TuLing(text_words=""):
    req = {
        "reqType": 0,
        "perception": {
            "inputText": {
                "text": text_words
            },
            "selfInfo": {
                "location": {
                    "city": "天津",
                    "province": "天津",
                    "street": "天津科技大学"
                }
            }
        },
        "userInfo": {
            "apiKey": turing_api_key,
            "userId": "Leosaf"
        }
    }

    req["perception"]["inputText"]["text"] = text_words
    response = requests.request("post", api_url, json=req, headers=headers)
    response_dict = json.loads(response.text)

    result = response_dict["results"][0]["values"]["text"]
    print("AI Robot said: " + result)

    return result

def SpeechSynthesis(text_words=""):
    result = client.synthesis(text_words, 'zh', 1, {'per': 4, 'vol': 10, 'pit': 9, 'spd': 5})
    if not isinstance(result, dict):
        with open('app.mp3', 'wb') as f:
            f.write(result)
    os.system('mpg321 app.mp3')

if __name__ == '__main__':
    while running:
        SoundRecording(path)
        resultText = SpeechRecognition(path)
        response = TuLing(resultText)
        if '退出' in response or '再见' in response or '拜拜' in response:
            running = False
        SpeechSynthesis(response)

Original: https://blog.csdn.net/qq_51718832/article/details/116229618
Author: Leosaf
Title: 用树莓派做一个语音机器人

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/513034/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球