ReSyn: Autonomously Scaling Synthetic Environments for Reasoning Models
2026-02-23 • Artificial Intelligence
Artificial IntelligenceMachine Learning
AI summaryⓘ
The authors developed ReSyn, a method to create many different reasoning tasks along with automatic checkers (verifiers) that test if the answers are correct. They used this to train language models, making them better at solving reasoning problems and math tests they hadn't seen before. Their experiments show that having diverse tasks and verifier feedback helps the models improve. This approach scales up reinforcement learning by focusing on verifiable rewards instead of just providing solutions.
reinforcement learningverifiable rewardsreasoning language modelsverifierssynthetic data generationconstraint satisfactionalgorithmic puzzlesspatial reasoningQwen2.5benchmarking
Authors
Andre He, Nathaniel Weir, Kaj Bostrom, Allen Nie, Darion Cassel, Sam Bayless, Huzefa Rangwala
Abstract
Reinforcement learning with verifiable rewards (RLVR) has emerged as a promising approach for training reasoning language models (RLMs) by leveraging supervision from verifiers. Although verifier implementation is easier than solution annotation for many tasks, existing synthetic data generation methods remain largely solution-centric, while verifier-based methods rely on a few hand-crafted procedural environments. In this work, we scale RLVR by introducing ReSyn, a pipeline that generates diverse reasoning environments equipped with instance generators and verifiers, covering tasks such as constraint satisfaction, algorithmic puzzles, and spatial reasoning. A Qwen2.5-7B-Instruct model trained with RL on ReSyn data achieves consistent gains across reasoning benchmarks and out-of-domain math benchmarks, including a 27\% relative improvement on the challenging BBEH benchmark. Ablations show that verifier-based supervision and increased task diversity both contribute significantly, providing empirical evidence that generating reasoning environments at scale can enhance reasoning abilities in RLMs