End-to-End Optimization of Incoherent Imaging for Classification Under Detector-Limited Readout

2026-06-08 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors study how combining specially designed optical parts (phase masks) with neural networks can help computers recognize objects from images. They find that these optical designs only offer noticeable improvements when the detector capturing the image is limited, like taking fewer or less detailed measurements. When the detector captures full detailed information, traditional lenses work just as well, and the advanced design doesn't help. The benefits also rely on factors like noise levels and the frequency content of the images. They back up their findings with theory and tests on common image datasets.

end-to-end optimizationphase maskincoherent imagingoptical front-endneural network back-enddetector readoutmutual informationclass separabilityspatial frequencyobject classification

Authors

Archer Wang, Joshua Chen, Sachin Vaidya, Marin Soljačić

Abstract

End-to-end co-optimization of optical front-ends (e.g. metasurfaces) and neural network back-ends has been widely applied to imaging tasks, yet a formalism characterizing when and why such systems outperform conventional lens-based imaging is largely lacking. This paper focuses on object classification, a central imaging task, and asks when end-to-end optimization of a phase mask for incoherent imaging improves performance over a conventional focusing lens. We find that these gains arise primarily under constrained detector readout and are limited under full detector readout. In the latter setting, we prove that no incoherent phase mask exceeds the ideal-channel mutual information between detector measurements and class labels; a conventional focusing lens approaches this ceiling, and joint optimization yields no empirical gain. When detector readout is constrained -- by coarse spatial sampling or a limited number of measurements -- optimized optics can substantially improve classification by increasing class separability in the detector measurements. These gains are largest under low detector noise and shrink as noise grows, because the optics shape the signal before it reaches the detector but cannot remove noise added afterward. The advantage also depends on the spectral structure of the task: co-design helps most when class-discriminative content is concentrated at lower spatial frequencies than within-class variation. We develop a theoretical framework formalizing these distinctions and test its predictions on synthetic data and standard benchmarks (MNIST, FashionMNIST, SVHN).

View PDFOpen arXiv