Quality Assurance for Speech Synthesis with ASR


Autoregressive TTS models are still widely used. Due to their stochastic nature, the output may vary from very good to completely unusable from one inference to another. In this publication, we propose to use the percentage of completely correct transcribed sentences (PCTS) of an ASR system as a new objective quality measure for TTS inferences. PCTS is easy to measure and represents the intelligibility dimension of a typical subjective evaluation with mean opinion score (MOS). We show that PCTS leads to similar results as subjective MOS evaluation. A more detailed, semi-automatic error analysis of the differences between ASR transcripts of TTS speech and the text used for generating the TTS speech can help identifying problems in the TTS training data, that are harder to find with other methods.

Mehr zum Titel

Titel Quality Assurance for Speech Synthesis with ASR
Medien Intelligent Systems Conference (IntelliSys 2022)
Verlag ---
Heft ---
Band ---
ISBN ---
Verfasser/Herausgeber Prof. Dr. René Peinl, Johannes Wirth
Seiten ---
Veröffentlichungsdatum 01.03.2022
Projekttitel ---
Zitation Peinl, René; Wirth, Johannes (2022): Quality Assurance for Speech Synthesis with ASR. Intelligent Systems Conference (IntelliSys 2022).