Unlocking UML Class Diagram Understanding in Vision Language Models

Abstract

Although Vision Language Models (VLMs) have seen tremendous progress across all kinds of use cases, they still fall behind in answering questions regarding diagrams compared to photos. Although progress has been made in the area of bar charts, line charts and similar diagrams, there is still few research concerned with other types of diagrams, e.g. in the computer science domain. We identified a gap in research on visual question answering on UML class diagrams. Our objective is to fill the gap by analyzing the performance of popular open-weight VLMs on a self-constructed benchmark for visual question answering based on UML class diagrams which is both challenging and manageable. We further construct a large-scale training dataset with 16.000 image-question-an-swer triples based on real software repositories on GitHub. We focus on Java-based repositories and filter for project sizes that are small enough to fit on a single diagram with 4000x4000 pixels maximum in a readable manner. We ask questions based on 18 question templates. We show that a LoRA-based finetune of Qwen 2.5 VL 7B easily outperforms Qwen 3.5 27B, which is a recent and well-performing VLM in many other benchmarks.

mehr

Mehr zum Titel

Titel	Unlocking UML Class Diagram Understanding in Vision Language Models
Medien	11th Future Technologies Conference (FTC 2026), 15-16 October 2026, Berlin, Germany
Band	2026
Verfasser	Artem Naboichenko, Prof. Dr. René Peinl
Veröffentlichungsdatum	15.10.2026
Projekttitel	M4-SKI
Zitation	Naboichenko, Artem; Peinl, René (2026): Unlocking UML Class Diagram Understanding in Vision Language Models. 11th Future Technologies Conference (FTC 2026), 15-16 October 2026, Berlin, Germany 2026.