How to out-perform default random forest regression: choosing hyperparameters for applications in large-sample hydrology

Published in Hydrology Research, 2026

Recommended citation: Bilolikar, D. K., More, A., Gong, A., & Janssen, J. (2026). How to out-perform default random forest regression: choosing hyperparameters for applications in large-sample hydrology. Hydrology Research, 57(1), 61-77. https://iwaponline.com/hr/article/57/1/61/110626

Predictions are a central part of water resources research. Despite their strong theoretical basis, the effective application of physically based models to catchment-scale processes remains an ongoing challenge and there are some important prediction problems that are not easily amenable to a first-principles representation. As such, machine learning (ML) models have been seen as a valid alternative in recent years. In spite of their availability, well-optimized state-of-the-art ML strategies are not widely used in water resources research. Further, some analyses require many model trainings, so sometimes computational time prevents properly optimized hyperparameters. To leverage data and use it effectively to drive scientific advances in the field, it is essential to make ML models accessible to users that may lack a deep understanding of ML by improving automated machine learning resources. ML models such as XGBoost have been recently shown to outperform random forest (RF) models which are traditionally used in water resources research. In this study, based on over 150 water-related datasets, we extensively compare XGBoost and RF. This study provides water scientists with access to quick user-friendly RF and XGBoost model optimization.