Hallucination in World Models is Predictable and Preventable
2026-06-25 • Machine Learning
Machine LearningComputer Vision and Pattern RecognitionRobotics
AI summaryⓘ
The authors studied why generative world models sometimes make believable but incorrect predictions, called hallucinations. They found that these errors happen mostly in parts of the environment where the model has seen little data. By creating a large new dataset (MMBench2) and training a world model on it, they identified specific types of hallucinations and developed ways to detect them early. Using these signals, they improved training and fine-tuning strategies to reduce hallucinations efficiently, even in new settings with limited real data. Their work shows that hallucinations are tied to gaps in training data and that detecting these gaps helps fix the problem.
generative world modelshallucinationstate-action spacedata coverageMMBench2 datasetworld model traininghallucination detectioncuriosity rewardsfine-tuningsimulator
Authors
Nicklas Hansen, Xiaolong Wang
Abstract
Modern generative world models render increasingly realistic action-controllable futures, yet they frequently hallucinate: rollouts remain visually fluent while drifting from the ground-truth dynamics. We hypothesize that hallucination concentrates in low-coverage regions of the state-action space, where lightweight data-centric signals can both detect it and guide mitigation. To test this, we introduce MMBench2, a 427-hour, 210-task dataset for visual world modeling with ground-truth actions, rewards, and live simulators, and train a 350M-parameter world model on it. We identify three distinct hallucination modes: perceptual, action-marginalized, and scene-diverging -- each anchored to a different stage of the pipeline, and develop three signals that accurately predict where the model will fail. To close coverage gaps at training time, we develop a coverage-aware sampling technique; to close them online, our hallucination predictors serve as curiosity rewards for targeted data collection, yielding a data-efficient finetuning recipe that adapts the pretrained world model to entirely unseen environments with as few as 50 real environment trajectories. Overall, our findings reveal that hallucination in world models is inherently a data coverage issue, and that the same signals used to detect it can also be used for mitigation. An interactive web version of our paper is available at https://www.nicklashansen.com/mmbench2