早就想写一篇语音机器人的文章,凑巧这两天受委托做个树莓派语音机器人,又复习一下流程熟悉了过程才准备写一篇文章,这是基于图灵机器人和百度api的语音助手。
目录
准备
硬件准备
首先,我们需要为覆盆子馅饼安装麦克风和扬声器。当然,我们可以不使用扬声器直接使用耳机,然后对其进行调试。
[En]
First of all, we need to install microphones and speakers for the raspberry pie. Of course, we can use headphones directly without speakers, and then debug them.
输入:
lsusb
或者
arecord -l
识别成功之后进行录音
arecord -D "plughw:1,0" -f dat -c 1 -r 16000 -d 5 test.wav
如果发现录音杂音很大的话可以尝试使用alsamixer进行调音
包的准备
我们通常需要下载的包如下
[En]
The packages we usually need to download are as follows
pip3 install baidu-aip
pip3 install requests
准备机器人
请注意,在我们都开始之前,我们必须加入。
[En]
Note that we must join before we all begin.
录音
录制通常非常简单,只需使用以下代码即可
[En]
Recording is usually very simple, just use the following code
import os
os.system('sudo arecord -D "plughw:1,0" -f S16_LE -r 16000 -d 4 ' + path)
但是我可以在第一次使用的时候使用它,但之后它就会出现。
[En]
But I can use it for the first time when I use it, but then it will appear.
arecord main:828的错误
这个错误我找了许多方法都没有解决于是我就换了个录音方法,比较麻烦,看个人需求,这里需要下载一个pyaudio包
pip3 install pyaudio
sudo apt-get install portaudio19-dev
pip3 install pyaudio
def SoundRecording(path):
import pyaudio
import wave
import os
import sys
CHUNK = 512
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = path
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
print("recording...")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print("done")
stream.stop_stream()
stream.close()
p.terminate()
wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()
语音转文字
这个就比较简单了,我们直接调用百度api就可以,我们先去百度AI的控制台申请个应用找到ID,AK,SK,然后获取access_token
APP_ID = '22894511'
API_KEY = 'En7e3iR8dHO1F7Hx3Fy7M0vd'
SECRET_KEY = 'c1591BrrbodXP5zQuBcQSNim8xcL6ZiE'
client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)
host = f'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=6KLdtAifYT46PtyzULAGpIzu&client_secret=tCEEz7LC4XfD2RA4ojgdOUvBBd7i3T4Y'
access_token = requests.get(host).json()["access_token"]
def SpeechRecognition(path):
with open(path, 'rb') as fp:
voices = fp.read()
try:
result = client.asr(voices, 'wav', 16000, {'dev_pid': 1537, })
result_text = result["result"][0]
print("you said: " + result_text)
return result_text
except KeyError:
print("KeyError")
图灵机器人回复
这里我们只需要将转成的文本内容发送给图灵机器人就可以了,这时我们也需要申请一个图灵机器人账号才可以,又图灵的AK
turing_api_key = "自己的AK"
api_url = "http://openapi.tuling123.com/openapi/api/v2"
headers = {'Content-Type': 'application/json;charset=UTF-8'}
def TuLing(text_words=""):
req = {
"reqType": 0,
"perception": {
"inputText": {
"text": text_words
},
"selfInfo": {
"location": {
"city": "天津",
"province": "天津",
"street": "天津科技大学"
}
}
},
"userInfo": {
"apiKey": turing_api_key,
"userId": "Leosaf"
}
}
req["perception"]["inputText"]["text"] = text_words
response = requests.request("post", api_url, json=req, headers=headers)
response_dict = json.loads(response.text)
result = response_dict["results"][0]["values"]["text"]
print("AI Robot said: " + result)
return result
文字转语音
返回值是文本,我们肯定会希望将其转换为语音,所以它会很有趣。
[En]
The returned value is text, and we will certainly want it to be converted into voice, so it will be fun.
host = f'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=6KLdtAifYT46PtyzULAGpIzu&client_secret=tCEEz7LC4XfD2RA4ojgdOUvBBd7i3T4Y'
access_token = requests.get(host).json()["access_token"]
def SpeechSynthesis(text_words=""):
result = client.synthesis(text_words, 'zh', 1, {'per': 4, 'vol': 10, 'pit': 9, 'spd': 5})
if not isinstance(result, dict):
with open('app.mp3', 'wb') as f:
f.write(result)
os.system('mpg321 app.mp3')
完整代码
这里的代码我是用的pyaudio,不需要可以自行修改
import json
import os
import requests
from aip import AipSpeech
BaiDu_APP_ID = "22894511"
API_KEY = "En7e3iR8dHO1F7Hx3Fy7M0vd"
SECRET_KEY = "c1591BrrbodXP5zQuBcQSNim8xcL6ZiE"
client = AipSpeech(BaiDu_APP_ID, API_KEY, SECRET_KEY)
turing_api_key = '67d5386150e248fea4af3db80f4ca1ae'
api_url = 'http://openapi.tuling123.com/openapi/api/v2'
headers = {'Content-Type': 'application/json;charset=UTF-8'}
host = f'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=6KLdtAifYT46PtyzULAGpIzu&client_secret=tCEEz7LC4XfD2RA4ojgdOUvBBd7i3T4Y'
access_token = requests.get(host).json()["access_token"]
running = True
resultText, path = "", "output.wav"
def SoundRecording(path):
import pyaudio
import wave
import os
import sys
CHUNK = 512
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = path
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
print("recording...")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print("done")
stream.stop_stream()
stream.close()
p.terminate()
wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()
def SpeechRecognition(path):
with open(path, 'rb') as fp:
voices = fp.read()
try:
result = client.asr(voices, 'wav', 16000, {'dev_pid': 1537, })
result_text = result["result"][0]
print("you said: " + result_text)
return result_text
except KeyError:
print("KeyError")
def TuLing(text_words=""):
req = {
"reqType": 0,
"perception": {
"inputText": {
"text": text_words
},
"selfInfo": {
"location": {
"city": "天津",
"province": "天津",
"street": "天津科技大学"
}
}
},
"userInfo": {
"apiKey": turing_api_key,
"userId": "Leosaf"
}
}
req["perception"]["inputText"]["text"] = text_words
response = requests.request("post", api_url, json=req, headers=headers)
response_dict = json.loads(response.text)
result = response_dict["results"][0]["values"]["text"]
print("AI Robot said: " + result)
return result
def SpeechSynthesis(text_words=""):
result = client.synthesis(text_words, 'zh', 1, {'per': 4, 'vol': 10, 'pit': 9, 'spd': 5})
if not isinstance(result, dict):
with open('app.mp3', 'wb') as f:
f.write(result)
os.system('mpg321 app.mp3')
if __name__ == '__main__':
while running:
SoundRecording(path)
resultText = SpeechRecognition(path)
response = TuLing(resultText)
if '退出' in response or '再见' in response or '拜拜' in response:
running = False
SpeechSynthesis(response)
Original: https://blog.csdn.net/qq_51718832/article/details/116229618
Author: Leosaf
Title: 用树莓派做一个语音机器人
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/513034/
转载文章受原作者版权保护。转载请注明原作者出处!