【API解析】微软edge浏览器大声朗读功能(read aloud)调用步骤
1. 来源
- github:MsEdgeTTS,edge-TTS-record
- 吾爱破解:微软语音助手免费版,支持多种功能,全网首发
2. 准备工作
- 功能来源:edge浏览器
- 抓包工具:fiddler
- 模拟请求:postman
3. 主要分析步骤
- 第一步:确定edge浏览器read aloud功能用js如何调用,fiddler上没有捕捉到
const voices = speechSynthesis.getVoices()
function speakbyvoice(text, voice) {
var utter = new SpeechSynthesisUtterance(text)
for (let v of voices) {
if (v.name.includes(voice)) {
utter.voice = v
break
}
}
speechSynthesis.speak(utter)
return utter
}
speakbyvoice("hello world", "Xiaoxiao")
- 第二步:试着对edge-TTS-record抓包,抓到了一个 http请求和 websocket连接。对照MsEdgeTTS的代码可知:
{
uri: "https://speech.platform.bing.com/consumer/speech/synthesize/readaloud/voices/list",
query: {
trustedclienttoken: "6A5AA1D4EAFF4E9FB37E23D68491D6F4"
}
method: "GET"
}
{
uri: "https://speech.platform.bing.com/consumer/speech/synthesize/readaloud/voices/list",
query: {
trustedclienttoken: "6A5AA1D4EAFF4E9FB37E23D68491D6F4"
},
sendmessage: {
audioformat:
X-Timestamp:Mon Jul 11 2022 17:50:42 GMT+0800 (中国标准时间)
Content-Type:application/json; charset=utf-8
Path:speech.config
{"context":{"synthesis":{"audio":{"metadataoptions":{"sentenceBoundaryEnabled":"false","wordBoundaryEnabled":"true"},"outputFormat":"webm-24khz-16bit-mono-opus"}}}}
,
ssml:
X-RequestId:7e956ecf481439a86eb1beec26b4db5a
Content-Type:application/ssml+xml
X-Timestamp:Mon Jul 11 2022 17:50:42 GMT+0800 (中国标准时间)Z
Path:ssml
hello world
}
}
4. 编写代码
- websocket库:WebSocketSharp。最新版安装失败的可以降版本安装,此文发布的时候最新预览版是
1.0.3-rc11
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
using WebSocketSharp;
namespace ConsoleTest
{
internal class Program
{
static string ConvertToAudioFormatWebSocketString(string outputformat)
{
return "Content-Type:application/json; charset=utf-8\r\nPath:speech.config\r\n\r\n{\"context\":{\"synthesis\":{\"audio\":{\"metadataoptions\":{\"sentenceBoundaryEnabled\":\"false\",\"wordBoundaryEnabled\":\"false\"},\"outputFormat\":\"" + outputformat + "\"}}}}";
}
static string ConvertToSsmlText(string lang, string voice, string text)
{
return $"{text}";
}
static string ConvertToSsmlWebSocketString(string requestId, string lang, string voice, string msg)
{
return $"X-RequestId:{requestId}\r\nContent-Type:application/ssml+xml\r\nPath:ssml\r\n\r\n{ConvertToSsmlText(lang, voice, msg)}";
}
static void Main(string[] args)
{
var url = "wss://speech.platform.bing.com/consumer/speech/synthesize/readaloud/edge/v1?trustedclienttoken=6A5AA1D4EAFF4E9FB37E23D68491D6F4";
var Language = "en-US";
var Voice = "Microsoft Server Speech Text to Speech Voice (zh-CN, XiaoxiaoNeural)";
var audioOutputFormat = "webm-24khz-16bit-mono-opus";
var binary_delim = "Path:audio\r\n";
var msg = "Hello world";
var sendRequestId = Guid.NewGuid().ToString().Replace("-", "");
var dataBuffers = new Dictionary<string, List<byte>>();
var webSocket = new WebSocket(url);
webSocket.SslConfiguration.ServerCertificateValidationCallback = (sender, certificate, chain, sslPolicyErrors) => true;
webSocket.OnOpen += (sender, e) => Console.WriteLine("[Log] WebSocket Open");
webSocket.OnClose += (sender, e) => Console.WriteLine("[Log] WebSocket Close");
webSocket.OnError += (sender, e) => Console.WriteLine("[Error] error message: " + e.Message);
webSocket.OnMessage += (sender, e) =>
{
if (e.IsText)
{
var data = e.Data;
var requestId = Regex.Match(data, @"X-RequestId:(?.*?)\r\n").Groups["requestId"].Value;
if (data.Contains("Path:turn.start"))
{
}
else if (data.Contains("Path:turn.end"))
{
webSocket.Close();
}
else if (data.Contains("Path:response"))
{
}
else
{
Console.WriteLine("unknow message: " + data);
}
}
else if (e.IsBinary)
{
var data = e.RawData;
var requestId = Regex.Match(e.Data, @"X-RequestId:(?.*?)\r\n").Groups["requestId"].Value;
if (!dataBuffers.ContainsKey(requestId))
dataBuffers[requestId] = new List<byte>();
if (data[0] == 0x00 && data[1] == 0x67 && data[2] == 0x58)
{
}
else
{
var index = e.Data.IndexOf(binary_delim) + binary_delim.Length;
dataBuffers[requestId].AddRange(data.Skip(index));
}
}
};
webSocket.Connect();
var audioconfig = ConvertToAudioFormatWebSocketString(audioOutputFormat);
webSocket.Send(audioconfig);
webSocket.Send(ConvertToSsmlWebSocketString(sendRequestId, Language, Voice, msg));
while (webSocket.IsAlive) { }
Console.WriteLine("接收到的音频字节长度:" + dataBuffers[sendRequestId].Count);
Console.ReadKey(true);
}
}
}
5. 结语
模拟websocket请求成功,缺陷是postman模拟结果显示音频 outputformat参数只能是 webm-24khz-16bit-mono-opus
,也就是说还需要再用ffmpeg之类的库转换格式。暂时也没找到比较好用的库,先记录到这
Original: https://blog.csdn.net/qq_41755979/article/details/125725807
Author: 永梦若曦
Title: 【API解析】微软edge浏览器大声朗读功能(read aloud)调用步骤
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/512627/
转载文章受原作者版权保护。转载请注明原作者出处!