Latent Equivariant Operators for Robust Object Recognition: Promise and Challenges

2026-02-20Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionMachine Learning
AI summary

The authors explore how to improve deep learning models when recognizing objects that appear in unusual ways, like rotated or moved images. They focus on a type of network that learns how to handle these changes by observing examples, instead of needing prior knowledge about the transformations. Using simple tests with rotated and shifted noisy MNIST digits, they show their approach works better for recognizing new, unseen data compared to traditional methods. However, they note that making this work for more complex images still presents challenges.

deep learningcomputer visionequivariant neural networkslatent spacesymmetric transformationsout-of-distribution classificationMNIST datasetimage rotationimage translation
Authors
Minh Dinh, Stéphane Deny
Abstract
Despite the successes of deep learning in computer vision, difficulties persist in recognizing objects that have undergone group-symmetric transformations rarely seen during training-for example objects seen in unusual poses, scales, positions, or combinations thereof. Equivariant neural networks are a solution to the problem of generalizing across symmetric transformations, but require knowledge of transformations a priori. An alternative family of architectures proposes to earn equivariant operators in a latent space from examples of symmetric transformations. Here, using simple datasets of rotated and translated noisy MNIST, we illustrate how such architectures can successfully be harnessed for out-of-distribution classification, thus overcoming the limitations of both traditional and equivariant networks. While conceptually enticing, we discuss challenges ahead on the path of scaling these architectures to more complex datasets.