MARS: Multi-rate Aggregation of Recency Signals for Sequential Recommendation across Sparse and Dense Regimes

2026-06-02Information Retrieval

Information Retrieval
AI summary

The authors noticed that current recommendation systems either pay attention to the order of user actions or use a simple decay over time, but neither handles multiple time scales well. They created MARS, a method that looks at real timestamps and summarizes user behavior across different time scales, combining these summaries in a smart way. MARS works efficiently and adapts to whether there is a lot or little data by choosing different underlying models. Tests on various datasets showed MARS improves recommendation accuracy compared to strong existing methods, especially on sparse data. The authors also ran tests isolating MARS's contribution and share their code for others to use.

sequential recommendationself-attentionstate-space modelstemporal dynamicstime scalesTransformerMambaHit Rate (HR@10)recency weightingcontext-adaptive gating
Authors
Zhenyu Yu, Shuigeng Zhou
Abstract
Sequential recommenders weight historical interactions either through positional self-attention as in Transformers or through a single implicit decay schedule as in State-Space Models. Neither makes the multi-scale temporal structure of real user behaviour explicit. We propose MARS, an encoder-agnostic aggregation operator that consumes real timestamps and produces K summaries emphasising distinct recency scales, fused by a context-adaptive gate. MARS adds at most 6% parameters and runs in $\mathcal{O}(LdK)$ time. MARS adapts to data density by automatically selecting between two encoder instantiations: MARS-T (Transformer) for sparse data and MARS-M (Mamba) for dense data, based on the average sequence length of the training set. On five public benchmarks against ten Transformer- and Mamba-based baselines under a unified RecBole protocol, MARS attains the best HR@10 on every benchmark, with mean relative gain +19.7% over the strongest content-only Transformer baseline on sparse data (reaching +36.2% on Games) and +3.2% HR@10 / +0.9% NDCG over SIGMA on dense ML-1M at 42% fewer MFLOPs, occupying the accuracy-efficiency Pareto frontier across the data-density spectrum. A backbone-only ablation isolates the marginal contribution of MARS at +4% to +19% HR@10 on sparse data and motivates the dual-instantiation design. The code is included in the supplementary material.