Neural Speech Synthesis in German

Abstract

While many speech synthesis systems based on deep neural networks are thoroughly evaluated and released for free use in English, models for languages with far less active speakers like German are scarcely trained and most often not published for common use. This work covers specific challenges in training text to speech models for the German language, including dataset selection and data preprocessing, and presents the training process for multiple models of an end-to-end text to speech system based on a combination of Tacotron 2 and Multi- Band MelGAN. All model compositions were evaluated against the mean opinion score, which revealed comparable results to models in literature that are trained and evaluated on English datasets. In addition, empirical analyses identified distinct aspects influencing the quality of such systems, based on subjective user experience. All trained models are released for public use.

Mehr zum Titel

Titel Neural Speech Synthesis in German
Medien 14th International Conference on Advances in Human-oriented and Personalized Mechanisms, Technologies, and Services (CENTRIC 2021)
Verlag ---
Heft ---
Band ---
ISBN ---
Verfasser/Herausgeber Johannes Wirth, Pascal Puchtler, Prof. Dr. René Peinl
Seiten ---
Veröffentlichungsdatum 07.10.2021
Projekttitel ---
Zitation Wirth, Johannes; Puchtler, Pascal; Peinl, René (2021): Neural Speech Synthesis in German. 14th International Conference on Advances in Human-oriented and Personalized Mechanisms, Technologies, and Services (CENTRIC 2021).