Agentic Discovery with Active Hypothesis Exploration for Visual Recognition
2026-04-14 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors created HypoExplore, a system that designs better neural networks for image recognition by treating it like a scientific experiment. Starting from a broad goal, HypoExplore generates and tests new network ideas using a mix of previous knowledge and curiosity about unknown parts. It keeps track of all ideas and their success to learn what works best. Tested on image tasks, their method improved accuracy significantly and worked well on different datasets. The authors also found that HypoExplore helps understand why certain designs perform better, not just finding good ones.
Neural Architecture SearchVisual RecognitionHypothesis-Driven ResearchEvolutionary BranchingLarge Language ModelsConfidence ScoresCIFAR-10MedMNISTTrajectory TreeHypothesis Memory Bank
Authors
Jaywon Koo, Jefferson Hernandez, Ruozhen He, Hanjie Chen, Chen Wei, Vicente Ordonez
Abstract
We introduce HypoExplore, an agentic framework that formulates neural architecture discovery for visual recognition as a hypothesis-driven scientific inquiry. Given a human-specified high-level research direction, HypoExplore ideates, implements, evaluates, and improves neural architectures through evolutionary branching. New hypotheses are created using a large language model by selecting a parent hypothesis to build upon, guided by a dual strategy that balances exploiting validated principles with resolving uncertain ones. Our proposed framework maintains a Trajectory Tree that records the lineage of all proposed architectures, and a Hypothesis Memory Bank that actively tracks confidence scores acquired through experimental evidence. After each experiment, multiple feedback agents analyze the results from different perspectives and consolidate their findings into hypothesis confidence updates. Our framework is tested on discovering lightweight vision architectures on CIFAR-10, with the best achieving 94.11% accuracy evolved from a root node baseline that starts at 18.91%, and generalizes to CIFAR-100 and Tiny-ImageNet. We further demonstrate applicability to a specialized domain by conducting independent architecture discovery runs on MedMNIST, which yield a state-of-the-art performance. We show that hypothesis confidence scores grow increasingly predictive as evidence accumulates, and that the learned principles transfer across independent evolutionary lineages, suggesting that HypoExplore not only discovers stronger architectures, but can help build a genuine understanding of the design space.