Giving Sensors a Voice: Multimodal JEPA for Semantic Time-Series Embeddings
2026-05-29 • Machine Learning
Machine Learning
AI summaryⓘ
The authors developed CHARM, a new Transformer-based method to better understand complex time series data from different sensors. CHARM uses descriptions of each sensor channel as input to help the model learn useful patterns that are stable over time and resistant to noise. They trained CHARM using a special technique called JEPA, which improves the quality of the learned representations. The authors tested CHARM on tasks like detecting anomalies and forecasting, showing strong results even with simple additional processing. The use of sensor descriptions mainly helps the model generalize better across different datasets.
TransformerMultivariate time seriesRepresentation learningChannel descriptionsEquivarianceJoint Embedding Predictive Architecture (JEPA)Latent-space predictionAnomaly detectionTime series forecastingSensor noise
Authors
Utsav Dutta, Gerardo Pastrana, Sina Khoshfetrat Pakazad, Henrik Ohlsson
Abstract
Transformer-based architectures have advanced sequence modeling in language and vision, yet general-purpose representation learning for heterogeneous multivariate time series remains underexplored. We introduce CHARM (Channel-Aware Representation Model), which incorporates channel-level textual descriptions into a Transformer encoder equivariant to channel order. CHARM is trained with a Joint Embedding Predictive Architecture (JEPA) and a novel loss promoting informative, temporally stable embeddings; latent-space prediction encourages robustness to sensor noise while description-aware gating provides interpretability through learned inter-channel relationships. Across anomaly detection, classification, and short- and long-term forecasting, the learned embeddings achieve strong performance using only a linear probe. Performance is driven primarily by the JEPA objective and conditioning architecture, with text descriptions serving as channel identifiers for cross-dataset generalization.