Drossel, Matthias; Gläßel, Daniel; Nasri, Fatemeh; Schmola, Gerald (2024)
2024 (45), S. 2096-2109.
Wirth, Johannes; Peinl, René (2024)
4th European Conference on the Impact of Artificial Intelligence and Robotics (ICAIR 2024) 2024.
As the output quality of neural networks in the fields of automatic speech recognition (ASR) and text-to-speech (TTS) continues to improve, new opportunities are becoming available to train models in a weakly supervised fashion, thus minimizing the manual effort required to annotate new audio data for supervised training. While weak supervision has recently shown very promising results in the domain of ASR, speech synthesis has not yet been thoroughly investigated regarding this technique despite requiring the equivalent training dataset structure of aligned audio-transcript pairs.
In this work, we compare the performance of TTS models trained using a well-curated and manually labeled training dataset to others trained on the same audio data with text labels generated using both grapheme- and phoneme-based ASR models. Phoneme-based approaches seem especially promising, since even for wrongly predicted phonemes, the resulting word is more likely to sound similar to the originally spoken word than for grapheme-based predictions.
For evaluation and ranking, we generate synthesized audio outputs from all previously trained models using input texts sourced from a selection of speech recognition datasets covering a wide range of application domains. These synthesized outputs are subsequently fed into multiple state-of-the-art ASR models with their output text predictions being compared to the initial TTS model input texts. This comparison enables an objective assessment of the intelligibility of the audio outputs from all TTS models, by utilizing metrics like word error rate and character error rate.
Our results not only show that models trained on data generated with weak supervision achieve comparable quality to models trained on manually labeled datasets, but can outperform the latter, even for small, well-curated speech datasets. These findings suggest that the future creation of labeled datasets for supervised training of TTS models may not require any manual annotation but can be fully automated.
Peinl, René; Wagener, Andreas; Lehmann, Marc (2024)
4th European Conference on the Impact of Artificial Intelligence and Robotics (ICAIR 2024), Lisbon, Portugal 2024.
There are many publications talking about the biases to be found in in generative AI solutions like large language models (LLMs, e.g., Mistral) or text-to-image models (T2IMs, e.g., Stable Diffusion). However, there is merely any publication to be found that questions what kind of behavior is actually desired, not only by a couple of researchers, but by society in general. Most researchers in this area seem to think that there would be a common agreement, but political debate in other areas shows that this is seldom the case, even for a single country. Climate change, for example, is an empirically well-proven scientific fact, 197 countries (including Germany) have declared to do their best to limit global warming to a maximum of 1.5°C in the Paris Agreement, but still renowned German scientists are calling LLMs biased if they state that there is human-made climate change and humanity is doing not enough to stop it. This trend is especially visible in Western individualistic societies that favor personal well-being over common good. In this article, we are exploring different aspects of biases found in LLMs and T2IMs, highlight potential divergence in the perception of ethically desirable outputs and discuss potential solutions with their advantages and drawbacks from the perspective of society. The analysis is carried out in an interdisciplinary manner with the authors coming from as diverse backgrounds as business information systems, political sciences, and law. Our contribution brings new insights to this debate and sheds light on an important aspect of the discussion that is largely ignored up to now.
Wolff, Dietmar (2024)
Fachgespräch des bayerischen Landesamtes für Pflege, online 04.12.2024.
Wagener, Andreas (2024)
Nerdwärts.de https://nerdwaerts.de/2024/12/wie-vr-und-andere-digitale-technologien-den-vergnuegungspark-von-morgen-formen/ 2024.
Vergnügungsparks stehen vor der Herausforderung, sich in einer zunehmend digitalisierten Welt weiterzuentwickeln, um ihre Attraktivität für ein breites Publikum zu sichern. Digitale Technologien, insbesondere Virtual Reality (VR), eröffnen hier neue Perspektiven. Sie verändern nicht nur das Besuchererlebnis, sondern haben auch einen signifikanten Einfluss auf den Geschäftserfolg.
Kemnitzer, Jonas; Groth, Christian (2024)
Proceedings of the 2nd International Conference on AI-generated Content 2024.
In this paper we present a stable-diffusion based zero-shot approach to realistically transform the image of a
human body into a more fit version of that depicted person. Therefore we combine a modified stable diffusion
model with inpainting techniques and incorporated constraints. We introduce a prototype which allows users to
upload a photo and visualize a more fit version of themselves. We evaluated our approach in various experiments
and focused on the applicability and effectiveness of these techniques, with attention to gender-specific results.
This work contributes to the fields of computer vision and generative AI by demonstrating practical applications
and identifying areas for improvement in realistic body transformation visualizations.
Wolff, Dietmar; Schmidt, Lisa-Marie (2024)
Newsletter Digital Insight 12/2024, 12/2024, S. 9-10.
Malek, Khadhraoui; Plenk, Valentin (2024)
DOI: 10.57944/1051-189
This book is intended as a practical guide to the concepts of hardware and software configuration for industrial production automation using the TIA PORTAL software platform. Thus, anyone working in the field of automation will benefit from reading it, while it has been written for undergraduate students of electrical, mechanical and industrial engineering, as well as engineering students engaged in similar academic pursuits.
This book deals with the use of S7 1200 and S71500 PLCs to control operational components in automated systems, in accordance with current standards. It is a good starting point into the world of Siemens' Totally Integrated Automation (TIA) product range.
The book also contains practical examples and explanatory diagrams of the graphical interfaces of the TIA PORTAL software, which illustrate the programming and configuration procedures and techniques.
Those interested in developing local industrial communication networks to implement centralised and decentralised control system architectures will also find this book useful. It details techniques provided by Siemens that are well suited to programming plans under the TIA PORTAL platform.
It also introduces the reader to Human Machine Interface (HMI) development, covering topics such as hardware configuration, software programming, networking, testing and validation.
This book is an invaluable resource for those new to the field of industrial automation, as well as for teachers wishing to teach and gain expertise in this specialised area.
Wagener, Andreas (2024)
Nerdwärts.de https://nerdwaerts.de/2024/11/wie-der-simplification-bias-unseren-sinn-fuer-gute-entscheidungen-truebt/ 2024.
Natürlich sollte man nichts verkomplizieren. Oft sind ja einfache Lösungen durchaus sinnvoll. Aber angesichts der Komplexität unserer Umwelt neigen wir offenbar dazu, Probleme auf vermeintlich eindeutige Ursachen zurückzuführen. Dieser „Simplification Bias“ bestimmt zunehmend den gesellschaftlichen Diskurs, führt aber auch in Managementfragen zu schlechten Entscheidungen.
Wolff, Dietmar (2024)
Impulsvortrag v3d Die Digitalisierung des Personalmanagements (HR digital), Kassel 26.11.2024.
Wagener, Andreas (2024)
Nerdwärts.de https://nerdwaerts.de/2024/11/wie-ki-das-loyalty-marketing-veraendert/ 2024.
Loyalty Marketing hat sich in den letzten Jahren stark gewandelt. Unternehmen müssen heute weit mehr tun, als nur Rabattkarten auszustellen oder Treuepunkte zu vergeben, um ihre Kunden langfristig zu binden. Künstliche Intelligenz (KI) spielt dabei zunehmend eine zentrale Rolle, insbesondere bei der Datenanalyse, Automatisierung und Personalisierung.
Wagener, Andreas (2024)
Discussion Panel “Intelligent Loyalty 5.0”, Top Voices – The Future of Loyalty 2024.
Wolff, Dietmar; Kreidenweis, Helmut (2024)
KI in der Sozialwirtschaft – Eine Orientierungshilfe für die Praxis 2024, S. 117-129.
Wolff, Dietmar (2024)
Vortrag bei der Tagung des Bundesverbandes Deutscher Stiftungen in Bamberg.
Wolff, Dietmar; Stock, Nele (2024)
Vortrag bei der Vincentz Altenheim Digital Konferenz, online.
Einhauser, Sebastian; Asam, Claudia; Weps, Manuela; Senninger, Antonia; Peterhoff, David; Bauernfeind, Stilla; Asbach, Benedikt; Carnell, George William; Heeney, Jonathan Luke; Wytopil, Monika; Fuchs, André; Messmann, Helmut; Prelog, Martina; Liese, Johannes; Jeske, Samuel D.; Protzer, Ulrike; Hoelscher, Michael; Geldmacher, Christof; Überla, Klaus; Steininger, Philipp; Wagner, Ralf; Gall, Christine; Wieser, Andreas; Müller-Schmucker, Sandra M.; Beileke, Stephanie; Goekkaya, Mehmet; Kling, Elisabeth; Rubio-Acero, Raquel; Plank, Michael; Christa, Catharina; Willmann, Annika; Vu, Martin; Lampl, Benedikt M.J.; Almanzar, Giovanni; Kousha, Kimia; Schwägerl, Valeria; Liebl, Bernhard; Weber, Beatrix; Drescher, Johannes; Scheidt, Jörg; Siebenhaar, Yannic; Reinel, Dirk; Wogenstein, Florian; Gefeller, Olaf; Covako-Study, Group (2024)
Einhauser, Sebastian; Asam, Claudia; Weps, Manuela; Senninger, Antonia...
eBioMedicine 110, 105438.
DOI: 10.1016/j.ebiom.2024.105438
Mehling, Simon; Hörnlein, Stefanie; Schnabel, Tobias; Beier, Silvio; Londong, Jörg (2024)
Water Reuse.
DOI: 10.2166/wrd.2024.054
Wagener, Andreas (2024)
Nerdwärts.de https://nerdwaerts.de/2024/11/macht-ki-uns-duemmer-oder-klueger-welche-kompetenzen-werden-wir-in-zukunft-noch-brauchen-und-wie-vermitteln-wir-diese/ 2024.
Der Rückgriff auf ChatGPT & Co. vereinfacht vieles im Alltag. Es ist unkompliziert und naheliegend, sich insbesondere Texte durch generative KI schreiben zu lassen oder auch Zusammenfassungen von komplexen und langen Artikeln damit zu erstellen, gerade wenn Zeit und Aufmerksamkeit begrenzt sind. Aber werden wir damit nicht zu bequem? Lassen wir unsere grauen Zellen damit verkümmern?Oder verkennen wir mit solchen Fragen das Potenzial der Technologie? Und welche Kompetenzen brauchen wir dann überhaupt in Zukunft noch?
Wolff, Dietmar; Klingbeil, Darren (2024)
Altenheim - Dossier Telematikinfrastrukur 2024 63, S. 20.
Wolff, Dietmar; Klingbeil, Darren (2024)
Altenheim - Dossier Telematikinfrastruktur 2024 63, S. 20.
Alfons-Goppel-Platz 1
95028 Hof
T +49 9281 409 - 4690
valentin.plenk[at]hof-university.de