Responsive image





Using LLMs to Improve Reproducibility of Literature Reviews.

Peinl, René; Haberl, Armin; Baernthaler, Jonathan; Chouguley, Sarang...

SIGSDA Symposium at the International Conference on Information Systems 2024. Bangkok, Thailand.


Open Access Peer Reviewed
 

Literature reviews play a crucial role in Information Systems (IS) research. However, scholars have expressed concerns regarding the reproducibility of their results and the quality of documentation. The involvement of human reproducers in these reviews is often hindered by the time-consuming nature of the procedures. The emergence of Large Language Models (LLMs) seems promising to support researchers and to enhance reproducibility. To explore this potential, we conducted experiments using various LLMs, focusing on abstract scanning, and have presented initial evidence suggesting that the application of LLMs in structured literature reviews could assist researchers in refining and formulating rules for abstract scanning. Based on our preliminary findings, we identify potential future research directions in this research in progress paper.

more

Comparing human-labeled and AI-labeled speech datasets for TTS

Wirth, Johannes; Peinl, René (2024)

4th European Conference on the Impact of Artificial Intelligence and Robotics (ICAIR 2024) 2024.


Open Access Peer Reviewed
 

As the output quality of neural networks in the fields of automatic speech recognition (ASR) and text-to-speech (TTS) continues to improve, new opportunities are becoming available to train models in a weakly supervised fashion, thus minimizing the manual effort required to annotate new audio data for supervised training. While weak supervision has recently shown very promising results in the domain of ASR, speech synthesis has not yet been thoroughly investigated regarding this technique despite requiring the equivalent training dataset structure of aligned audio-transcript pairs.
In this work, we compare the performance of TTS models trained using a well-curated and manually labeled training dataset to others trained on the same audio data with text labels generated using both grapheme- and phoneme-based ASR models. Phoneme-based approaches seem especially promising, since even for wrongly predicted phonemes, the resulting word is more likely to sound similar to the originally spoken word than for grapheme-based predictions.
For evaluation and ranking, we generate synthesized audio outputs from all previously trained models using input texts sourced from a selection of speech recognition datasets covering a wide range of application domains. These synthesized outputs are subsequently fed into multiple state-of-the-art ASR models with their output text predictions being compared to the initial TTS model input texts. This comparison enables an objective assessment of the intelligibility of the audio outputs from all TTS models, by utilizing metrics like word error rate and character error rate.
Our results not only show that models trained on data generated with weak supervision achieve comparable quality to models trained on manually labeled datasets, but can outperform the latter, even for small, well-curated speech datasets. These findings suggest that the future creation of labeled datasets for supervised training of TTS models may not require any manual annotation but can be fully automated.

more

Ethical Generative AI – What Kind of AI Results are Desired by Society?

Peinl, René; Wagener, Andreas; Lehmann, Marc (2024)

4th European Conference on the Impact of Artificial Intelligence and Robotics (ICAIR 2024), Lisbon, Portugal 2024.


Open Access Peer Reviewed
 

There are many publications talking about the biases to be found in in generative AI solutions like large language models (LLMs, e.g., Mistral) or text-to-image models (T2IMs, e.g., Stable Diffusion). However, there is merely any publication to be found that questions what kind of behavior is actually desired, not only by a couple of researchers, but by society in general. Most researchers in this area seem to think that there would be a common agreement, but political debate in other areas shows that this is seldom the case, even for a single country. Climate change, for example, is an empirically well-proven scientific fact, 197 countries (including Germany) have declared to do their best to limit global warming to a maximum of 1.5°C in the Paris Agreement, but still renowned German scientists are calling LLMs biased if they state that there is human-made climate change and humanity is doing not enough to stop it. This trend is especially visible in Western individualistic societies that favor personal well-being over common good. In this article, we are exploring different aspects of biases found in LLMs and T2IMs, highlight potential divergence in the perception of ethically desirable outputs and discuss potential solutions with their advantages and drawbacks from the perspective of society. The analysis is carried out in an interdisciplinary manner with the authors coming from as diverse backgrounds as business information systems, political sciences, and law. Our contribution brings new insights to this debate and sheds light on an important aspect of the discussion that is largely ignored up to now.

more

Die innere Stimme - Wenn der Chatbot den Roboter steuert.

Peinl, René (2024)

c't Magazin für Computertechnik 2024 (23), S. 130-132.


 

Roboter, die autonom und flexibel arbeiten, könnten in Zukunft im Haushalt helfen. Um ihre Schritte zu planen, brauchen sie künstliche Intelligenz. Generative Sprachmodelle sollen dafür nicht nur Sätze oder Programmcode schreiben, sondern die Abläufe auch strukturieren.

more

White-box LLM-supported Low-code Engineering: A Vision and First Insights

Thomas, Buchmann; Peinl, René; Schwägerl, Felix (2024)

27th International Conference on Model Driven Engineering Languages and Systems (Models 2024) 2024, S. 556--560.
DOI: 10.1145/3652620.3687803


Open Access Peer Reviewed
 

Low-code development (LCD) platforms promise to empower citizen developers to define core domain models and rules for business applications. However, as domain rules grow complex, LCD platforms may fail to do so effectively. Generative AI, driven by large language models (LLMs), offers source code generation from natural language but suffers from its non-deterministic black-box nature and limited explainability. Therefore, rather than having LLMs generate entire applications from single prompts, we advocate for a white-box approach allowing citizen developers to specify domain models semi-formally, attaching constraints and operations as natural language annotations. These annotations are fed incrementally into an LLM contextualized with the generated application stub. This results in deterministic and better explainable generation of static application components, while offering citizen developers an appropriate level of abstraction. We report on a case study in manufacturing execution systems, where the implementation of the approach provides first insights.

more

Mit allen Sinnen - Multimodale KIs kombinieren Bild und Text.

Peinl, René (2024)

c't Magazin für Computertechnik 2024 (11), S. 52-56.


 

Kaum hat sich der Mensch an Text- und Bildgeneratoren gewöhnt, veröffentlichen OpenAI, Google, Microsoft und Meta ihre multimodalen Modelle, die beide Welten vereinen. Das ermöglicht praktischen KI-Anwendungen und sogar Robotern ein umfassenderes Verständnis der Welt.

more

Evaluation of Medium-Sized Language Models in German and English Language

Peinl, René; Wirth, Johannes (2024)

International Journal on Natural Language Computing (IJNLC) 2024 (1).


Open Access
 

Large language models (LLMs) have garnered significant attention, but the definition of “large” lacks clarity. This paper focuses on medium-sized language models (MLMs), defined as having at least six billion parameters but less than 100 billion. The study evaluates MLMs regarding zero-shot generative question answering in German and English language, which requires models to provide elaborate answers without external document retrieval (RAG). The paper introduces an own test dataset and presents results from human evaluation. Results show that combining the best answers from different MLMs yielded an overall correct answer rate of 82.7% which is better than the 60.9% of ChatGPT. The best English MLM achieved 71.8% and has 33B parameters, which highlights the importance of using appropriate training data for fine-tuning rather than solely relying on the number of parameters. The best German model also surpasses ChatGPT for the equivalent dataset. More fine-grained feedback should be used to further improve the quality of answers. The open source community is quickly closing the gap to the best commercial models.

more

Wenn Prozessmodellierung Realität wird - Modernes Geschäftsprozessmanagement

Peinl, René (2023)

IM+io - Best Practices aus Digitalisierung | Management | Wissenschaft 2023 (4).


 

In der Industrie wird immer nopch viel Zeit damit verschwendet, dass Geschäftprozesse als Grafik modelliert werden, um einen Überblick zu bekommen, sie zu analysieren und zu verbessern.  Für die Softwareunterstützung als ausführbare Prozesse in einem Enterprise Information System wie ERP oder MES müssen sie jedoch noch einmal systemspezifisch implementiert werden. Im Produktionsumfeld geht das mit dem Open Source System HiCuMES, einer Mischung aus LowCode Werkzeug mit grafischen Editoren und Manufacturing Execution System auch anders.

more

SPORENLP: A Spatial Recommender System for Scientific Literature

Wirth, Johannes; Roßner, Daniel; Peinl, René; Atzenbeck, Claus (2023)

Proceedings of the 19th International Conference on Web Information Systems and Technology (WEBIST'23) 2023, S. 429–436.
DOI: 10.5220/0012210400003584


Open Access Peer Reviewed
 

SPORENLP is a recommendation system designed to review scientific literature. It operates on a sub-dataset comprising 15,359 publications, with a total of 117,941,761 pairwise comparisons. This dataset includes both metadata comparisons and text-based similarity aspects obtained using natural language processing (NLP) techniques.Unlike other recommendation systems, SPORENLP does not rely on specific aspect features. Instead, it identifies the top k candidates based on shared keywords and embedding-related similarities between publications, enabling content-based, intuitive, and adjustable recommendations without excluding possible candidates through classification. To provide users with an intuitive interface for interacting with the dataset, we developed a web-based front-end that takes advantage of the principles of spatial hypertext. A qualitative expert evaluation was conducted on the dataset. The dataset creation pipeline and the source code for SPORENLP will be made freely available to the research community, allowing further exploration and improvement of the system.

more

Klein aber fein - Wie kompakte Sprachmodelle die Giganten herausfordern

Peinl, René (2023)

c't - Magazin für Computertechnik 2023 (26), S. 50-55.


 

Eine Zeitlang kannte die Para­meterzahl großer Sprachmodel­le nur eine Richtung: steil nach oben. Mehr Parameter bedingen mehr und hochwertigere Fähig­keiten, so die Überzeugung. Doch 2023 schlug die Stunde der mittelgroßen Sprach­KIs:  Sie sind genügsam – und  erstaunlich konkurrenzfähig. In mancher Disziplin rücken sie erstaunlich nahe an GPT-4 mit seinen kolportierten 1,8 Billionen Parametern heran. Damit tut sich ein riesiges Potenzial auf – auch für kleinere und mittelgroße Unternehmen, die mit eigenen  Anwendungen  liebäugeln.  Wir erklären, was die schlanken Verwandten der Giganten können, was sie so effizient macht und wie die Zukunft der Sprachmodelllandschaft aussehen könnte.

more

Evaluation of medium-large Language Models at zero-shot closed book generative question answering

Peinl, René; Wirth, Johannes (2023)

11th International Conference on Artificial Intelligence and Applications (AIAP) 2023.


Open Access Peer Reviewed
 

Large language models (LLMs) have garnered significant attention, but the definition of "large" lacks clarity. This paper focuses on medium-sized lan-guage models (MLMs), defined as having at least six billion parameters but less than 100 billion. The study evaluates MLMs regarding zero-shot genera-tive question answering, which requires models to provide elaborate answers without external document retrieval. The paper introduces an own test da-taset and presents results from human evaluation. Results show that combin-ing the best answers from different MLMs yielded an overall correct answer rate of 82.7% which is better than the 60.9% of ChatGPT. The best MLM achieved 46.4% and has 7B parameters, which highlights the importance of using appropriate training data for fine-tuning rather than solely relying on the number of parameters. More fine-grained feedback should be used to further improve the quality of answers.

more

The Hochschul-Assistenz-System HAnS: An ML-Based Learning Experience Platform

Ranzenberger, Thomas; Bocklet, Tobias; Freisinger, Steffen; Frischholz, Lia...

Elektronische Sprachsignalverarbeitung 2023 2023.


Open Access Peer Reviewed
 

The usage of e-learning platforms, online lectures and online meetings for academic teaching increased during the Covid-19 pandemic. Lecturers created video lectures, screencasts, or audio podcasts for online learning. The Hochschul-Assistenz-System (HAnS) is a learning experience platform that uses machine learning (ML) methods to support students and lecturers in the online learning and teaching processes. HAnS is being developed in multiple iterations as an agile open-source collaborative project supported by multiple universities and partners. This paper presents the current state of the development of HAnS on German video lectures.

more

Dependencies between MES features and efficient introduction

Peinl, René; Purucker, Susanne K; Vogel, Sabine (2022)

14th International Conference on ENTERprise Information Systems (CENTERIS 2022).


Peer Reviewed
more

Automatic Speech Recognition in German - A Detailled Error Analysis

Wirth, Johannes; Peinl, René (2022)

IEEE Coins - International Conference on Omni Layer Intelligent Systems.


Open Access Peer Reviewed
 

The amount of freely available systems for automatic speech recognition (ASR) based on neural networks is growing steadily, with equally increasingly reliable predictions. However, the evaluation of trained models is typically exclusively based on statistical metrics such as WER or CER, which do not provide any insight into the nature or impact of the errors produced when predicting transcripts from speech input. This work presents a selection of ASR model architectures that are pretrained on the German language and evaluates them on a benchmark of diverse test datasets. It identifies cross-architectural prediction errors, classifies those into categories and traces the sources of errors per category back into training data as well as other sources. Finally, it discusses solutions in order to create qualitatively better training datasets and more robust ASR systems.

more

Quality Assurance for Speech Synthesis with ASR

Peinl, René; Wirth, Johannes (2022)

Intelligent Systems Conference (IntelliSys 2022).


Open Access Peer Reviewed
 

Autoregressive TTS models are still widely used. Due to their stochastic nature, the output may vary from very good to completely unusable from one inference to another. In this publication, we propose to use the percentage of completely correct transcribed sentences (PCTS) of an ASR system as a new objective quality measure for TTS inferences. PCTS is easy to measure and represents the intelligibility dimension of a typical subjective evaluation with mean opinion score (MOS). We show that PCTS leads to similar results as subjective MOS evaluation. A more detailed, semi-automatic error analysis of the differences between ASR transcripts of TTS speech and the text used for generating the TTS speech can help identifying problems in the TTS training data, that are harder to find with other methods.

more

Presence in VR experiences - an empirical cost-benefit-analysis

Peinl, René; Wirth, Tobias (2022)

6th International Congress on Information and Communication Technology (ICICT 2021).


Open Access Peer Reviewed
 

Virtual reality (VR) is on the edge of getting a mainstream platform for gaming, education and product design. The feeling of being present in the virtual world is influenced by many factors and even more intriguing a single negative influence can destroy the illusion that was created with a lot of effort by other measures. Therefore, it is crucial to have a balance between the influencing factors, know the importance of the factors and have a good estimation of how much effort it takes to bring each factor to a certain level of fidelity. This paper collects influencing factors discussed in literature, analyses the immersion of current off-the-shelf VR-solutions and presents results from an empirical study on efforts and benefits from certain aspects influencing presence in VR experiences. It turns out, that sometimes delivering high fidelity is easier to achieve than medium fidelity and for other aspects it is worthwhile investing more effort to achieve higher fidelity to improve presence a lot.

more

Neural Speech Synthesis in German

Wirth, Johannes; Puchtler, Pascal; Peinl, René (2021)

14th International Conference on Advances in Human-oriented and Personalized Mechanisms, Technologies, and Services (CENTRIC 2021).


Open Access Peer Reviewed
 

While many speech synthesis systems based on deep neural networks are thoroughly evaluated and released for free use in English, models for languages with far less active speakers like German are scarcely trained and most often not published for common use. This work covers specific challenges in training text to speech models for the German language, including dataset selection and data preprocessing, and presents the training process for multiple models of an end-to-end text to speech system based on a combination of Tacotron 2 and Multi- Band MelGAN. All model compositions were evaluated against the mean opinion score, which revealed comparable results to models in literature that are trained and evaluated on English datasets. In addition, empirical analyses identified distinct aspects influencing the quality of such systems, based on subjective user experience. All trained models are released for public use.

more

Technical Lag in OSS Integration

Weber, Thomas; Peinl, René (2021)

World Congress in Computer Science, Computer Engineering & Applied Computing (CSCE‘21).


Open Access Peer Reviewed
 

Software integration, especially in Open Source Software (OSS), suffers from technical debt as software evolves over time. Evolvement often ends in incompatibility issues like changed or removed APIs. The problems increase, as more components need to be integrated. This article analyzes common problems in OSS integration with respect to technical debt esp. externally induced technical debt and collects empirical evidence from the case study "AMiProSI" which combines different functions of various OSS systems to create a unified intranet solution. As part of the analysis, this paper discusses component-based development and its related dependencies and explains related technical debt (TD). By bringing down the individual problems in sub-categories of technical debt, we show what kind of TD has arisen in our case study AMiProSI and give advice on how this could be prevented in other projects.

more

HUI-Audio-Corpus-German: A high quality TTS dataset

Puchtler, Pascal; Wirth, Johannes; Peinl, René (2021)

44th German Conference on Artificial Intelligence (KI2021).


Open Access Peer Reviewed
 

The increasing availability of audio data on the internet lead to a multitude of datasets for development and training of text to speech applications, based on neural networks. Highly differing quality of voice, low sampling rates, lack of text normalization and disadvantageous alignment of audio samples to corresponding transcript sentences still limit the performance of deep neural networks trained on this task. Additionally, data resources in languages like German are still very limited. We introduce the "HUI-Audio-Corpus-German", a large, open-source dataset for TTS engines, created with a processing pipeline, which produces high quality audio to transcription alignments and decreases manual effort needed for creation.

more

Open Source Speech Recognition on Edge Devices

Peinl, René; Rizk, Basem; Szabad, Robert (2020)

10th International Conference on Advanced Computer Information Technologies (ACIT).


Open Access Peer Reviewed
 

Deep learning has revived the field of automatic speech recognition (ASR) in the last ten years and pushed recognition rates into regions on par with humans. Applications like Siri, Amazon Alexa and Google Assistant are very popular, but have inherent privacy problems. In this paper, we evaluate state of the art open source ASR models regarding their usability in a smart speaker without cloud, both in terms of accuracy and runtime performance on cost-effective low power edge devices. We found Kaldi to be the most accurate solution and also among the fastest ones. It runs more than fast enough on an Nvidia Jetson Nano. It is still not on par with commercial cloud services, but getting close to it.

more

Prof. Dr. René Peinl


Hochschule für Angewandte Wissenschaften Hof

Forschungsgruppe Systemintegration (SI)
Alfons-Goppel-Platz 1
95028 Hof

T +49 9281 409-4820
rene.peinl[at]hof-university.de