MLP Splatting: Object-Centric Neural Fields

2026-06-02Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors introduce MLP-Splatting, a new way to create 3D scenes that allows breaking the scene into separate parts easily, unlike previous methods that needed extra steps to separate objects. They use small neural networks (MLPs) to represent each part, which helps keep details local and improves how the scene looks from different angles. This method uses only color images for training and lets users edit individual scene parts without needing special masks. It also uses less memory and renders faster compared to similar state-of-the-art techniques.

3D representationNovel-view synthesisNeural Radiance Fields (NeRF)3D Gaussian SplattingMLP (Multi-Layer Perceptron)Volumetric renderingScene decompositionOpacity and radianceSemantic segmentation
Authors
Shinjeong Kim, Yuzhou Cheng, Xin Kong, Paul H. J. Kelly, Andrew J. Davison
Abstract
3D representations are fundamental to scene rendering, understanding, and interaction. Recent approaches, such as 3D Gaussian Splatting and Neural Radiance Fields, achieve impressive photorealistic novel-view synthesis, but lack the ability to easily decompose scene elements into a few primitives, requiring additional segmentation or grouping for object-level manipulation. We present MLP-Splatting, a method that enables scene decomposition via a few expressive light-field primitives while providing photorealistic novel-view synthesis. MLP-Splatting models each primitive as an independent compact MLP with localized spatial support that predicts radiance and opacity. In contrast to low-level Gaussian primitives or a single global radiance field, our neural primitives provide greater expressive capacity while remaining spatially localized. Rendering is performed through efficient sparse volumetric compositing over ray-primitive interactions. Our primitives are supervised using RGB supervision alone, which yields primitives that represent local scene regions often corresponding to objects or object parts, enabling interactive object-level editing without segmentation masks by selecting a handful of primitives. Our method, augmented with optional semantic feature distillation, enables open-vocabulary scene interaction and open-set instant segmentation. Compared to state-of-the-art methods, we achieve substantially lower memory usage (1/15$\times$) and faster rendering (3$\times$), as we show in our experiments compared to semantic 3DGS methods. Project Page: https://shinjeongkim.com/mlp-splatting