Beyond Segmentation: Structurally Informed Facade Parsing from Imperfect Images

2026-04-10Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionGraphicsMachine Learning
AI summary

The authors found that typical object detectors treat parts of building facades separately, which can cause problems when trying to recreate the building structure. They improved YOLOv8 by adding a special training rule that makes detected boxes line up better, encouraging consistent grid patterns. Their method helps fix errors caused by viewpoint changes and parts being blocked without making the detection less accurate. This approach keeps the normal detection process the same but makes the results more organized for later use.

object detectionYOLOv8facade parsingbounding boxesalignment lossgeometric priorsCMP datasetstructural regularityperspective errorsocclusion
Authors
Maciej Janicki, Aleksander Plocharski, Przemyslaw Musialski
Abstract
Standard object detectors typically treat architectural elements independently, often resulting in facade parsings that lack the structural coherence required for downstream procedural reconstruction. We address this limitation by augmenting the YOLOv8 training objective with a custom lightweight alignment loss. This regularization encourages grid-consistent arrangements of bounding boxes during training, effectively injecting geometric priors without altering the standard inference pipeline. Experiments on the CMP dataset demonstrate that our method successfully improves structural regularity, correcting alignment errors caused by perspective and occlusion while maintaining a controllable trade-off with standard detection accuracy.