A Benchmark for Semi-supervised Multi-modal Crowd Counting
2026-06-02 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors introduce the first benchmark for counting crowds using multiple types of data (or modalities) with limited labeled examples. They define a setup where only part of the data is labeled and create a standard way to split the data for training and testing. To provide comparison points, they adapt various existing methods that either use all labeled data or only single types of data with partial labels. They then test these methods under their new benchmark to see how well they work. The authors plan to share their code and data splits publicly.
semi-supervised learningmulti-modal datacrowd countingbenchmarkdata partitionlabeled dataunlabeled datasupervised learningbaseline methodsevaluation protocol
Authors
Haoliang Meng, Xiaopeng Hong, Yabin Wang, Wangmeng Zuo
Abstract
This paper constructs the first benchmark on semi-supervised multi-modal crowd counting. To lay the foundation for this unexplored task, we first formulate the semi-supervised multi-modal setting and a standardized protocol that specifies the labeled-unlabeled data partition across different labeled ratios. Next, to establish solid reference points, we carefully tailor a diverse set of representative baselines, including existing fully supervised multi-modal methods and semi-supervised single-modal methods. Then, we carefully evaluate their performance under our proposed benchmark. Codes and the data partition will be released on https://github.com/HenryCilence/Semi-supervised-Multimodal-Crowd-Counting.