SITUATE - Synthetic Object Counting Dataset for VLM training

Abstract

We present SITUATE, a novel dataset designed for training and evaluating Vision Language Models on counting tasks with spatial constraints. The dataset bridges the gap between simple 2D datasets like VLMCountBench and often ambiguous real-life datasets like TallyQA, which lack control over occlusions and spatial composition. Experiments show that our dataset helps to improve generalization for out-of-distribution images, since a finetune of Qwen VL 2.5 7B on SITUATE improves accuracy on the Pixmo count test data, but not vice versa. We cross validate this by comparing the model performance across established other counting benchmarks and against an equally sized fine-tuning set derived from Pixmo count.

Mehr zum Titel

Titel SITUATE - Synthetic Object Counting Dataset for VLM training
Medien 21st International Conference on Computer Vision Theory and Applications (VISAPP26), Marbella, Spain
Verfasser Prof. Dr. René Peinl, Vincent Tischler, Patrick Schröder, Prof. Dr. Christian Groth
Veröffentlichungsdatum 10.03.2026
Projekttitel M4-SKI
Zitation Peinl, René; Tischler, Vincent; Schröder, Patrick; Groth, Christian (2026): SITUATE - Synthetic Object Counting Dataset for VLM training. 21st International Conference on Computer Vision Theory and Applications (VISAPP26), Marbella, Spain.