Although Vision Language Models (VLMs) have seen tremendous progress across all kinds of use cases, they still fall behind in answering questions regarding diagrams compared to photos. Although progress has been made in the area of bar charts, line charts and similar diagrams, there is still few research concerned with other types of diagrams, e.g. in the computer science domain. We identified a gap in research on visual question answering on UML class diagrams. Our objective is to fill the gap by analyzing the performance of popular open-weight VLMs on a self-constructed benchmark for visual question answering based on UML class diagrams which is both challenging and manageable. We further construct a large-scale training dataset with 16.000 image-question-an-swer triples based on real software repositories on GitHub. We focus on Java-based repositories and filter for project sizes that are small enough to fit on a single diagram with 4000x4000 pixels maximum in a readable manner. We ask questions based on 18 question templates. We show that a LoRA-based finetune of Qwen 2.5 VL 7B easily outperforms Qwen 3.5 27B, which is a recent and well-performing VLM in many other benchmarks.
mehr| Titel | Unlocking UML Class Diagram Understanding in Vision Language Models |
|---|---|
| Band | 2026 |
| Verfasser | Artem Naboichenko, Prof. Dr. René Peinl |
| Veröffentlichungsdatum | 15.10.2026 |
| Projekttitel | M4-SKI |
| Zitation | Naboichenko, Artem; Peinl, René (2026): Unlocking UML Class Diagram Understanding in Vision Language Models. 2026. |