loading page

Bankfull and Mean-flow Channel Geometry Estimation through a Hybrid Multi-Regression and Machine Learning Algorithms across the CONtiguous United States (CONUS)
  • +1
  • Reihaneh Zarrabi,
  • Riley McDermott,
  • Seyed Mohammad Hassan Erfani,
  • Sagy Cohen
Reihaneh Zarrabi
University of Alabama
Author Profile
Riley McDermott
University of Alabama
Author Profile
Seyed Mohammad Hassan Erfani
University of South Carolina
Author Profile
Sagy Cohen
University of Alabama, Tuscaloosa

Corresponding Author:[email protected]

Author Profile

Abstract

Widely adopted models for estimating channel geometry attributes rely on simplistic power-law (hydraulic geometry) equations. This study presents a new generation of channel geometry models based on a hybrid approach combining traditional statistical methods (Multi-Linear Regression (MLR)) and advanced tree-based Machine Learning (ML) algorithms (Random Forest Regression (RFR) and eXtreme Gradient Boosting Regression (XGBR)), utilizing novel datasets. To achieve this, a new preprocessing method was applied to refine an extensive observational dataset, namely the HYDRoacoustic dataset supporting Surface Water Oceanographic Topography (HYDRoSWOT). This process improved data quality and identified observations representing bankfull and mean-flow conditions. A compiled dataset, combining the preprocessed dataset with datasets containing additional catchment attributes like the National Hydrography Dataset Plus (NHDplusv2.1), was then used to train a suite of models to predict channel width and depth under bankfull and mean-flow conditions. The analysis shows that tree-based ML algorithms outperform traditional statistical methods in accuracy and handling the data but face limitations in prediction capabilities for streams with characteristics outside the training range. Consequently, a hybrid method was selected, combining XGBR for streams within the dataset range and MLR for those outside it. Two tiers of models were developed for each attribute using discharges derived from distinct sources (HYDRoSWOT and NHDPlusV2.1, respectively), where the second tier of models offers applicability across approximately 2.6 million streams within NHDplusv2.1. Comprehensive independent evaluations are conducted to assess the capability of the developed models in providing stream/reach-averaged (rather than at-a-station) predictions for locations outside the training and testing datasets.
30 May 2024Submitted to ESS Open Archive
30 May 2024Published in ESS Open Archive