Neural Speech Synthesis in German

Abstract

While many speech synthesis systems based on deep neural networks are thoroughly evaluated and released for free use in English, models for languages with far less active speakers like German are scarcely trained and most often not published for common use. This work covers specific challenges in training text to speech models for the German language, including dataset selection and data preprocessing, and presents the training process for multiple models of an end-to-end text to speech system based on a combination of Tacotron 2 and Multi- Band MelGAN. All model compositions were evaluated against the mean opinion score, which revealed comparable results to models in literature that are trained and evaluated on English datasets. In addition, empirical analyses identified distinct aspects influencing the quality of such systems, based on subjective user experience. All trained models are released for public use.

Mehr zum Titel

Titel	Neural Speech Synthesis in German
Medien	14th International Conference on Advances in Human-oriented and Personalized Mechanisms, Technologies, and Services (CENTRIC 2021)
Verfasser	Johannes Wirth, Pascal Puchtler, Prof. Dr. René Peinl
Veröffentlichungsdatum	2021-10-07
Zitation	Wirth, Johannes; Puchtler, Pascal; Peinl, René (2021): Neural Speech Synthesis in German. 14th International Conference on Advances in Human-oriented and Personalized Mechanisms, Technologies, and Services (CENTRIC 2021).