CustomDancer: Customized Dance Recommendation by Text-Dance Retrieval

2026-05-01Multimedia

Multimedia
AI summary

The authors created TD-Data, a large collection of short dance videos paired with expert-written descriptions, to help computers learn how to match dance moves with text. They developed CustomDancer, a system that uses music and motion information along with text to find dances based on written searches. Their method showed better results than previous ones when tested on their new dataset. This work aims to make it easier to find specific dance styles or moves using natural language.

dance retrievalmultimodal learningCLIP modelmotion dynamicstext encodingmusic-motion blendingdataset annotationRecall@1choreographic intentuser preference study
Authors
Yawen Qin, Ke Qiu, Qin Zhang
Abstract
Dance serves as both a cultural cornerstone and a medium for personal expression, yet the rapid growth of online dance content has made personalized discovery increasingly difficult. Text-based dance retrieval offers a natural interface for users to search with choreographic intent, but it remains underexplored because dance requires simultaneous reasoning over linguistic semantics, musical rhythm, and full-body motion dynamics. We introduce TD-Data, a large-scale open dataset for text-dance retrieval, containing about 4,000 12-second dance clips, 14.6 hours of motion, 22 genres, and annotations from professional dance experts. On top of this dataset, we propose CustomDancer, a multimodal retrieval framework that aligns text with dance through a CLIP-based text encoder, music and motion encoders, and a music-motion blending module. CustomDancer achieves state-of-the-art performance on TD-Data, reaching 10.23% Recall@1 and improving retrieval quality in both quantitative benchmarks and user preference studies.