Modeling Depth Ambiguity: A Mixture-Density Representation for Flying-Point-Free Depth Estimation

2026-06-01 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence

AI summaryⓘ

The authors found that depth estimation models often mess up near object edges because these models try to pick just one depth for each pixel, even when a pixel actually covers both a near and far surface. This leads to weird points appearing in empty spaces. They created a new method called MDA that lets the model guess multiple possible depths per pixel with different chances, which helps fix these errors near edges. Their approach also works for tricky cases like transparent objects and sky regions, improving the overall depth quality without slowing the model down much.

depth estimationflying pointsmixture densityforeground-background boundariesdepth hypothesestransparent objectssky segmentation3D reconstructionpixel depth ambiguityruntime overhead

Authors

Siyuan Bian, Congrong Xu, Jun Gao

Abstract

Despite advances in depth estimation, flying points remain a persistent failure mode: near object boundaries, depth estimators often predict spurious 3D points in the empty space between foreground and background surfaces. We trace this artifact to a standard modeling choice: assigning each pixel a single depth hypothesis. At boundaries, a pixel can straddle a foreground and a background surface, so its true depth is ambiguous between the two. A model that predicts a single depth cannot keep both possibilities, so training instead pulls the prediction toward an intermediate depth that lies on neither surface. We address this with MDA, a mixture-density representation that lets the model predict multiple depth hypotheses and their associated probabilities for each pixel. Near boundaries, different hypotheses can align with different surfaces, and the decoded depth is selected from one of these hypotheses rather than placed in the empty space between them. Across different backbones, MDA substantially improves boundary reconstruction and largely removes flying-point artifacts even under severe input blur, while adding negligible runtime overhead. The same mixture-density framework naturally extends to transparent objects, where it predicts multiple depth layers at transparent pixels, and to sky regions, where a dedicated component separates the unbounded sky from finite-depth regions, producing flying-point-free skylines. Project Page: https://biansy000.github.io/mda-site/.

View PDFOpen arXiv