1
This study develops a machine learning framework for daily streamflow prediction at the Sandia gauging station on India's Narmada River, focusing on extreme flow events. Four algorithms—Support Vector Machine (SVM), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Artificial Neural Network (ANN)—were evaluated using seven predictors, including 1-day and 7-day lagged discharge values and a first-order differential feature (Qt-1 - Qt-7), achieving a crossvalidation R2 of 0.97. Hyper-parameter optimisation via grid search with 3-fold cross-validation ensured robust performance and minimised overfitting. Models were assessed using RMSE, R2, NSE, and correlation across flow regimes. XGBoost outperformed others, achieving NSE = 0.996 in training and NSE = 0.975 for 99th percentile flows, with RF as a consistent secondary performer. ANN excelled in generalisation but faltered under extreme conditions. XGBoost showed minimal median bias (0.17 m3/s) and accurately predicted outlier frequencies. Ensemble methods, particularly XGBoost, proved reliable for operational flood forecasting across diverse hydrological conditions. The temporal lag feature engineering effectively captured immediate and longer-term hydrological responses, enhancing extreme event prediction accuracy.
Streamflow prediction, Machine learning, Extreme flow forecasting, Narmada River, Flood prediction, Temporal lag analysis