*Result*: Large-span Concrete-Filled Steel Tube void defect identification with imbalanced datasets using a novel interpretable BO-DRGC framework.
*Further Information*
*This paper addresses the issue of internal crown type void defect detection in Concrete-Filled Steel Tube (CFST) structures by proposing an innovative and interpretable machine learning (ML) framework, termed BO-DRGC. This framework integrates Bayesian optimization, ML, and the Shapley Additive exPlanations (SHAP) tool to tackle the challenge of imbalanced datasets, achieving high-accuracy prediction of void defects while providing interpretability of the prediction results. The results show that Random Oversampling (RO) performs best. Among six ML models, those based on Bayesian-optimized Decision Tree (DT), Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and Categorical Boosting (CatBoost) exhibit superior performance. To enhance the stability and accuracy, an ensemble model framework named Bayesian Optimization (BO)-DT-RF-GBDT-CatBoost (abbreviated as BO-DRGC) is developed, leveraging a voting mechanism that combines predictions from DT, RF, GBDT, and CatBoost. The BO-DRGC framework achieves a prediction accuracy of 0.895, recall of 0.89, and an Area Under the Curve (AUC) of 0.94. Furthermore, this paper conducts an in-depth interpretability analysis of the model predictions using the SHAP tool, revealing the key features, such as interval transit time, sound velocity, and sound wave amplitude, that have the greatest impact on the model's predictions from a global perspective. Additionally, a detailed analysis of the feature contributions in individual sample predictions is provided from a micro perspective. Finally, an interactive software based on the BO-DRGC framework has been developed for rapid identification of crown-type void defects in CFSTs. In practical engineering, it provides engineers with real-time, intuitive predictions and explanatory analyses, demonstrating high application value. • Multiple resampling techniques are employed to effectively address the issue of imbalanced data classification. • The BO-DRGC framework integrates multiple improved ML models, enhancing prediction accuracy and stability. • Providing global and local explanations and analyses of model predictions using the SHAP tool. • The developed CFST void defect prediction software can provide real-time and accurate prediction results in practical engineering. [ABSTRACT FROM AUTHOR]*