From Gaze to Guidance: Interpreting and Adapting to Users' Cognitive Needs with Multimodal Gaze-Aware AI Assistants

2026-04-09Human-Computer Interaction

Human-Computer InteractionArtificial Intelligence
AI summary

The authors created an AI assistant that watches where people look on screen to figure out when they might be having trouble reading. They tested this by comparing it to a normal text-only AI assistant with 36 participants. The gaze-aware assistant was better at understanding users' reading struggles, helped them remember information more, and needed fewer words to communicate. However, when it misinterpreted gaze, some users found it less helpful. Overall, the authors show that using eye-tracking with AI can improve how well people learn and understand information.

Large Language Models (LLM)Egocentric VideoGaze TrackingMultimodal AIUser AssistanceCognitive LoadRetrospective AssistanceHuman-Computer InteractionReading ComprehensionQualitative Study
Authors
Valdemar Danry, Javier Hernandez, Andrew Wilson, Pattie Maes, Judith Amores
Abstract
Current LLM assistants are powerful at answering questions, but they have limited access to the behavioral context that reveals when and where a user is struggling. We present a gaze-grounded multimodal LLM assistant that uses egocentric video with gaze overlays to identify likely points of difficulty and target follow-up retrospective assistance. We instantiate this vision in a controlled study (n=36) comparing the gaze-aware AI assistant to a text-only LLM assistant. Compared to a conventional LLM assistant, the gaze-aware assistant was rated as significantly more accurate and personalized in its assessments of users' reading behavior and significantly improved people's ability to recall information. Users spoke significantly fewer words with the gaze-aware assistant, indicating more efficient interactions. Qualitative results underscored both perceived benefits in comprehension and challenges when interpretations of gaze behaviors were inaccurate. Our findings suggest that gaze-aware LLM assistants can reason about cognitive needs to improve cognitive outcomes of users.