Evaluating Zero-Shot and One-Shot Adaptation of Small Language Models in Leader-Follower Interaction

2026-02-26Human-Computer Interaction

Human-Computer InteractionArtificial IntelligenceMachine LearningRobotics
AI summary

The authors studied how small language models (SLMs) can help robots decide who is the leader and who is the follower during interactions, which is important for smooth communication. They created a special dataset and tested two ways to improve the models: prompt engineering and fine-tuning. Their experiments showed that fine-tuning the models without extra examples (zero-shot) worked best, achieving good accuracy quickly. However, adding more context (one-shot) made the models less reliable, revealing a trade-off between handling complex conversations and working well on small devices.

leader-follower interactionhuman-robot interactionsmall language modelslarge language modelsprompt engineeringfine-tuningzero-shot learningone-shot learningrole classificationedge computing
Authors
Rafael R. Baptista, André de Lima Salgado, Ricardo V. Godoy, Marcelo Becker, Thiago Boaventura, Gustavo J. G. Lahr
Abstract
Leader-follower interaction is an important paradigm in human-robot interaction (HRI). Yet, assigning roles in real time remains challenging for resource-constrained mobile and assistive robots. While large language models (LLMs) have shown promise for natural communication, their size and latency limit on-device deployment. Small language models (SLMs) offer a potential alternative, but their effectiveness for role classification in HRI has not been systematically evaluated. In this paper, we present a benchmark of SLMs for leader-follower communication, introducing a novel dataset derived from a published database and augmented with synthetic samples to capture interaction-specific dynamics. We investigate two adaptation strategies: prompt engineering and fine-tuning, studied under zero-shot and one-shot interaction modes, compared with an untrained baseline. Experiments with Qwen2.5-0.5B reveal that zero-shot fine-tuning achieves robust classification performance (86.66% accuracy) while maintaining low latency (22.2 ms per sample), significantly outperforming baseline and prompt-engineered approaches. However, results also indicate a performance degradation in one-shot modes, where increased context length challenges the model's architectural capacity. These findings demonstrate that fine-tuned SLMs provide an effective solution for direct role assignment, while highlighting critical trade-offs between dialogue complexity and classification reliability on the edge.