ID and Graph View Contrastive Learning with Multi-View Attention Fusion for Sequential Recommendation
2026-04-15 • Information Retrieval
Information RetrievalMachine Learning
AI summaryⓘ
The authors study how to better predict what items a user will like next based on their past behavior. They combine two ways of looking at data: one that treats user-item interactions as sequences and another that views items as connected in a graph. Their method, called MVCrec, uses three types of contrastive learning to improve how user and item information is represented and combines these views using an attention mechanism. Tests on real datasets show their approach works better than many current methods. They also share their code and data for others to use.
Sequential recommendationContrastive learningGraph neural networksUser-item interactionMulti-view learningAttention mechanismNDCGHitRatioRepresentation learning
Authors
Xiaofan Zhou, Kyumin Lee
Abstract
Sequential recommendation has become increasingly prominent in both academia and industry, particularly in e-commerce. The primary goal is to extract user preferences from historical interaction sequences and predict items a user is likely to engage with next. Recent advances have leveraged contrastive learning and graph neural networks to learn more expressive representations from interaction histories -- graphs capture relational structure between nodes, while ID-based representations encode item-specific information. However, few studies have explored multi-view contrastive learning between ID and graph perspectives to jointly improve user and item representations, especially in settings where only interaction data is available without auxiliary information. To address this gap, we propose Multi-View Contrastive learning for sequential recommendation (MVCrec), a framework that integrates complementary signals from both sequential (ID-based) and graph-based views. MVCrec incorporates three contrastive objectives: within the sequential view, within the graph view, and across views. To effectively fuse the learned representations, we introduce a multi-view attention fusion module that combines global and local attention mechanisms to estimate the likelihood of a target user purchasing a target item. Comprehensive experiments on five real-world benchmark datasets demonstrate that MVCrec consistently outperforms 11 state-of-the-art baselines, achieving improvements of up to 14.44\% in NDCG@10 and 9.22\% in HitRatio@10 over the strongest baseline. Our code and datasets are available at https://github.com/sword-Lz/MMCrec.