语音合成论文优选:Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Mod



Disclaimer: the preferred series of speech synthesis papers mainly share papers, sharing papers without direct translation, and the content is mainly my summary and personal views on the content of the paper. If it is reproduced, please indicate the source.



Welcome to follow Wechat official account: keep a low profile and forge ahead

Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

本文是google在2021.04.13更新的文章,主要解决Parallel Tacotron的对齐问题,本系统不需要额外的对齐信息,具体的文章链接



Parallel Tacotron: Non-Autoregressive and Controllable TTS



1 研究背景

tacotron系列以合成高质量语音和闻名,但其自回归模式限制其速度,因此本文提出了非自回归parallel tacotron,当然其合成质量是接近taoctron2。但parallel tacotron需要额外的对齐信息来训练duration decoder,因此本文在parallel tacotron基础上提出了parallel tacotron2,该模型使用一种新颖的attention机制来进行对齐。(对齐矩阵)

2 详细架构

先来看一下图1展示了parallel tacotron系统架构,主要由input encoder, residual encoder, duration decoder, spectrogram decoder构成。当然该系统也需要借助外部的对齐信息来训练duration decoder 。

语音合成论文优选:Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Mod

接下来的parallel tacotron2主要使用下图的结构进行时长估算。首先duration predictor估算每个token的时长,然后learning upsampling模块根据时长信息来学习attention matrix w和auxiliary attention context C。当然这样预测出来的特征跟真实的特征帧数不一样,无法求loss,因此使用soft-DTW来进行求值。最后的loss为公式7。

语音合成论文优选:Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Mod

语音合成论文优选:Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Mod

语音合成论文优选:Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Mod

语音合成论文优选:Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Mod

语音合成论文优选:Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Mod

3 实验结果

table 1,table 2和table 3显示本文的parallel tacotron2在preference好于parallel tacotron和tacotron2 。图3展示了使用本文进行调速。

语音合成论文优选:Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Mod

语音合成论文优选:Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Mod

语音合成论文优选:Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Mod

4 总结

本文主要解决Parallel Tacotron的对齐问题,本系统不需要额外的对齐信息。

Original: https://blog.csdn.net/liyongqiang2420/article/details/116154164
Author: 我叫永强
Title: 语音合成论文优选:Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Mod





亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球