语音合成论文优选:Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Mod

免责声明:首选系列演讲合成论文以分享论文为主,分享论文不直接翻译,内容主要是我对论文内容的总结和个人观点。如果是转载,请注明出处。

[En]

Disclaimer: the preferred series of speech synthesis papers mainly share papers, sharing papers without direct translation, and the content is mainly my summary and personal views on the content of the paper. If it is reproduced, please indicate the source.

欢迎关注微信公众号:低调砥砺前行

[En]

Welcome to follow Wechat official account: keep a low profile and forge ahead

Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

本文是google在2021.04.13更新的文章,主要解决Parallel Tacotron的对齐问题,本系统不需要额外的对齐信息,具体的文章链接

https://arxiv.org/pdf/2103.14574.pdf

第一篇文章

Parallel Tacotron: Non-Autoregressive and Controllable TTS

的链接

https://arxiv.org/pdf/2010.11439.pdf

1 研究背景

tacotron系列以合成高质量语音和闻名,但其自回归模式限制其速度,因此本文提出了非自回归parallel tacotron,当然其合成质量是接近taoctron2。但parallel tacotron需要额外的对齐信息来训练duration decoder,因此本文在parallel tacotron基础上提出了parallel tacotron2,该模型使用一种新颖的attention机制来进行对齐。(对齐矩阵)

2 详细架构

先来看一下图1展示了parallel tacotron系统架构,主要由input encoder, residual encoder, duration decoder, spectrogram decoder构成。当然该系统也需要借助外部的对齐信息来训练duration decoder 。

语音合成论文优选:Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Mod

接下来的parallel tacotron2主要使用下图的结构进行时长估算。首先duration predictor估算每个token的时长,然后learning upsampling模块根据时长信息来学习attention matrix w和auxiliary attention context C。当然这样预测出来的特征跟真实的特征帧数不一样,无法求loss,因此使用soft-DTW来进行求值。最后的loss为公式7。

语音合成论文优选:Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Mod

语音合成论文优选:Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Mod

语音合成论文优选:Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Mod

语音合成论文优选:Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Mod

语音合成论文优选:Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Mod

3 实验结果

table 1,table 2和table 3显示本文的parallel tacotron2在preference好于parallel tacotron和tacotron2 。图3展示了使用本文进行调速。

语音合成论文优选:Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Mod

语音合成论文优选:Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Mod

语音合成论文优选:Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Mod

4 总结

本文主要解决Parallel Tacotron的对齐问题,本系统不需要额外的对齐信息。

Original: https://blog.csdn.net/liyongqiang2420/article/details/116154164
Author: 我叫永强
Title: 语音合成论文优选:Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Mod

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/526655/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球