How Many Trees in a Random Forest? A Revisited Approach with Plateau Search and Optuna Integration
2026-06-02 • Machine Learning
Machine Learning
AI summaryⓘ
The authors address a problem in tuning Random Forest models, where increasing the number of trees usually always improves performance, making it hard to pick the best number. They propose a new method that doesn't directly search for the number of trees but instead looks for a 'plateau' where adding more trees doesn’t help much, using comparisons among three tree sizes. This method adapts during the optimization process to stop when enough trees are used, making it both automated and easier to understand. Their experiments show that the ideal number of trees can be quite different from usual guesses, sometimes smaller or larger depending on the dataset.
Random ForestHyperparameter OptimizationNumber of TreesOut-of-Bag ScoreTree-structured Parzen Estimator (TPE)Early StoppingEnsemble SizePlateau SearchBenchmark DatasetsBioinformatics Datasets
Authors
Vadim Porvatov, Andrey Dukhovny, Andrey Lange
Abstract
Hyperparameter optimization (HPO) for Random Forest faces a specific difficulty in tuning the number of trees: the predictive score typically improves monotonically with ensemble size, so standard methods such as Tree-structured Parzen Estimator (TPE) and Hyperband require a predefined search range and often drive the estimate toward its right boundary. Early-stopping strategies avoid fixing such a range, but can be sensitive to score noise and prone to premature stopping. To address this, we propose an integrated triplet-based plateau-search algorithm that removes the number of trees from the direct TPE search space and still exploits information accumulated across HPO trials. The method adaptively tracks a near-minimal sufficient ensemble size by monitoring relative changes in the out-of-bag (OOB) score across a triplet of forest sizes and shifting this triplet accordingly. This yields an automated and user-interpretable procedure based on a tolerance parameter. We also provide a theoretical analysis: we relate the proposed relative OOB-score criterion to the gap between the current and limiting scores, and derive an asymptotic variance estimate for the corresponding OOB-based absolute relative difference. Experiments show that the selected number of trees can differ substantially from the common heuristic: for most classical benchmark datasets it is smaller, whereas for some high-dimensional bioinformatics datasets, such as Arcene and Dorothea, it is larger. The source code and reproducible experiments are available at https://github.com/lange-am/rf_plateau_hpo.