TalkPro: A Multimodal Language Learning and Evaluation System

Abstract

As universities around the world welcome increasing numbers of international students, there is a growing demand for scalable, objective tools that can support both language learning and applicant selection based on spoken language proficiency. In particular, pronunciation and comprehension remain persistent challenges for non-native speakers and are key factors for communication in academic environments. Traditional methods of assessing these skills are labor-intensive or often rely on surface-level metrics such as transcription accuracy, which do not fully capture a learner’s communicative competence. This work introduces TalkPro, a multi-modal system for pronunciation and comprehension assessment as well as language learning, designed to address this need. The system provides continuous, personalized feedback on learners’ spoken language, with a specific focus on phoneme-level accuracy as well as articulatory patterns. Instead of relying solely on conventional speech recognition outputs, which are often able to compensate even major pronunciation errors, TalkPro generates detailed acoustic analyses that pinpoint learner-specific difficulties. These include not only phoneme-level errors but also recurring articulatory tendencies, such as misplacement of the tongue, incorrect voicing, or inappropriate manner of articulation. The system also includes a text-to-speech (TTS) engine to generate spoken content adapted to vocabulary gaps, which is then followed by targeted comprehension questions. TTS can also be used to test listening comprehension either word by word in a dictation style or semantically using a large language model (LLM) as a judge. Overall, these components form an integral approach to pronunciation and comprehension training in a blended learning environment and can also be used for automated assessment. Preliminary experiments with incoming students from India to Germany indicate that phoneme-level ASR effectively identifies pronunciation errors, whereas grapheme-level ASR tends to overlook them. Future research will involve a comprehensive evaluation of automated results against human judgment, alongside the expansion of TalkPro's training capabilities with LLM-based reading comprehension modules that prioritize conceptual understanding over traditional verbatim recall.

more

Mehr zum Titel

Titel TalkPro: A Multimodal Language Learning and Evaluation System
Medien 2nd International Conference on Education Research - ICER 2025
Verfasser Johannes Wirth, Prof. Dr. René Peinl
Veröffentlichungsdatum 2025-10-31
Projekttitel M4-SKI
Zitation Wirth, Johannes; Peinl, René (2025): TalkPro: A Multimodal Language Learning and Evaluation System. 2nd International Conference on Education Research - ICER 2025 .