SEAL: Semantic-aware Single-image Sticker Personalization with a Large-scale Sticker-tag Dataset

2026-04-29Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors studied how to create personalized stickers from just one reference image using diffusion models, which is hard because models can overfit and mix up background details or lose flexibility in editing. They proposed SEAL, a new module that helps the model focus better on important parts and structure during adaptation without changing the core model. They also built StickerBench, a dataset of stickers with detailed labels to test how well the model keeps the sticker's identity while allowing changes. Their experiments show SEAL improves both preserving the original look and editing control.

diffusion modelspersonalized text-to-image generationsingle-image synthesistest-time fine-tuningspatial attentionembedding adaptationidentity preservationcontextual controllabilitysticker datasettoken strategy
Authors
Changhyun Roh, Yonghyun Jeong, Jonghyun Lee, Chanho Eom, Jihyong Oh
Abstract
Synthesizing a target concept from a single reference image is challenging in diffusion-based personalized text-to-image generation, particularly for sticker personalization where prompts often require explicit attribute edits. With only one reference, test-time fine-tuning (TTF) methods tend to overfit, producing \textit{visual entanglement}, where background artifacts are absorbed into the learned concept, and \textit{structural rigidity}, where the model memorizes reference-specific spatial configurations and loses contextual controllability. To address these issues, we introduce \textbf{SE}mantic-aware single-image sticker person\textbf{AL}ization (\textbf{SEAL}), a plug-and-play, architecture-agnostic adaptation module that integrates into existing personalization pipelines without modifying their U-Net-based diffusion backbones. SEAL applies three components during embedding adaptation: (1) a Semantic-guided Spatial Attention Loss, (2) a Split-merge Token Strategy, and (3) Structure-aware Layer Restriction. To support sticker-domain personalization with attribute-level control, we present StickerBench, a large-scale sticker image dataset with structured tags under a six-attribute schema (Appearance, Emotion, Action, Camera Composition, Style, Background). These annotations provide a consistent interface for varying context while keeping target identity fixed, enabling systematic evaluation of identity disentanglement and contextual controllability. Experiments show that SEAL consistently improves identity preservation while maintaining contextual controllability, highlighting the importance of explicit spatial and structural constraints during test-time adaptation. The code, StickerBench, and project page will be publicly released.