TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration

2026-04-15Artificial Intelligence

Artificial IntelligenceComputation and Language
AI summary

The authors present TREX, a system that uses multiple AI agents to fully automate the process of training large language models (LLMs). TREX coordinates tasks like understanding requirements, researching data and literature, deciding training strategies, and running the training itself. It treats the sequence of training trials as a search tree, helping the system learn from previous attempts and improve over time. The authors also created FT-Bench, a set of 10 real-world tasks to test how well TREX can optimize different aspects of LLMs, and found that TREX reliably improves model performance.

Large Language ModelsMulti-agent SystemsModel TrainingAutomated Machine LearningSearch TreeTraining StrategiesBenchmarkingModel EvaluationData PreparationIterative Optimization
Authors
Zerun Ma, Guoqiang Wang, Xinchen Xie, Yicheng Chen, He Du, Bowen Li, Yanan Sun, Wenran Liu, Kai Chen, Yining Li
Abstract
While Large Language Models (LLMs) have empowered AI research agents to perform isolated scientific tasks, automating complex, real-world workflows, such as LLM training, remains a significant challenge. In this paper, we introduce TREX, a multi-agent system that automates the entire LLM training life-cycle. By orchestrating collaboration between two core modules-the Researcher and the Executor-the system seamlessly performs requirement analysis, open-domain literature and data research, formulation of training strategies, preparation of data recipes, and model training and evaluation. The multi-round experimental process is modeled as a search tree, enabling the system to efficiently plan exploration paths, reuse historical results, and distill high-level insights from iterative trials. To evaluate the capability of automated LLM training, we construct FT-Bench, a benchmark comprising 10 tasks derived from real-world scenarios, ranging from optimizing fundamental model capabilities to enhancing performance on domain-specific tasks. Experimental results demonstrate that the TREX agent consistently optimizes model performance on target tasks.