Implicit Data Synthesis for Contrastive Unsupervised Data Augmentation

2026-06-05Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors point out that scientific data often comes unlabeled and is hard to label by hand, so unsupervised learning methods are useful. They focus on contrastive learning, which usually alters data to create new samples, but changing scientific data directly can mess up its meaning. Instead, the authors suggest tweaking the neural network's weights to create different views for learning, keeping the original data intact. They tested this idea on radar data showing meteors and found it worked better than usual methods.

unsupervised learningcontrastive learningSimCLRdata augmentationneural network weightsradar observationsmeteorsstructural representationsdata perturbation
Authors
Patrick Kage, Trevor Hedges, N. Siddharth, Pavlos Andreadis
Abstract
Scientific observations generate large quantities of unlabeled data which is laborious to hand-label, making unsupervised learning techniques valuable for processing datasets. Among these approaches, contrastive learning provides a convenient mechanism for extracting structural representations from unannotated datasets. For natural imagery, the general approach is to use a variety of data-space augmentation methods in order to generate synthetic samples; however, for scientific observations data-space perturbations can fundamentally alter the underlying data. Our proposed method is to generate contrastive samples by perturbing the network weights rather than the underlying data, thus more closely preserving the structure of the data. We demonstrate this technique using a SimCLR-based pipeline applied over radar observations of meteors, and show performance gains under matched protocols.