Label-Efficient School Detection from Aerial Imagery via Weakly Supervised Pretraining and Fine-Tuning

2026-05-05 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial IntelligenceMachine Learning

AI summaryⓘ

The authors developed a method to find schools using aerial images without needing many hand-labeled examples. They first create automatic labels from limited location points and segmentations, then train a model to recognize schools. After this, they improve the model with a small number of carefully labeled images. This approach works well even with very little human-labeled data, making it easier and cheaper to map schools worldwide. Their work helps plan education infrastructure and internet access in places with poor records.

school detectionaerial imageryweak supervisionsemantic segmentationbounding boxesobject detectionlow-data regimeinfrastructure mappingautomatic labelingfine-tuning

Authors

Zakarya Elmimouni, Fares Fourati, Mohamed-Slim Alouini

Abstract

Accurate school detection is essential for supporting education initiatives, including infrastructure planning and expanding internet connectivity to underserved areas. However, many regions around the world face challenges due to outdated, incomplete, or unavailable official records. Manual mapping efforts, while valuable, are labor-intensive and lack scalability across large geographic areas. To address this, we propose a weakly supervised framework for school detection from aerial imagery that minimizes the need for human annotations while supporting global mapping efforts. Our method is specifically designed for low-data regimes, where manual annotations are extremely scarce. We introduce an automatic labeling pipeline that leverages sparse location points and semantic segmentation to generate infrastructure masks from which we generate bounding boxes. Using these automatically labeled images, we train our detectors on a first training stage to learn a representation of what schools look like, then using a small set of manually labeled images, we fine-tune the previously trained models on this clean dataset. This two stage training pipeline enables large-scale and strong detection in low-data setting of school infrastructure with minimal supervision. Our results demonstrate strong object detection performance, particularly in the low-data regime, where the models achieve promising results using only 50 manually labeled images, significantly reducing the need for costly annotations. This framework supports education and connectivity initiatives worldwide by providing an efficient and extensible approach to mapping schools from space. All models, training code and auto-labeled data will be publicly released to foster future research and real-world impact.

View PDFOpen arXiv