*Result*: An explainable hybrid framework for early detection of cardiovascular diseases using Categorical Boosting and Bees algorithm.

Title:
An explainable hybrid framework for early detection of cardiovascular diseases using Categorical Boosting and Bees algorithm.
Authors:
Sen J; School of Computer Science Engineering and Information Systems (SCORE), Vellore Institute of Technology, Vellore, Tamil Nadu, 632014, India. jayanta.sen@vit.ac.in., Bhattacharya S; School of Computer Science Engineering and Information Systems (SCORE), Vellore Institute of Technology, Vellore, Tamil Nadu, 632014, India. sweta.b@vit.ac.in.
Source:
Scientific reports [Sci Rep] 2025 Dec 13; Vol. 15 (1), pp. 45748. Date of Electronic Publication: 2025 Dec 13.
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: Nature Publishing Group Country of Publication: England NLM ID: 101563288 Publication Model: Electronic Cited Medium: Internet ISSN: 2045-2322 (Electronic) Linking ISSN: 20452322 NLM ISO Abbreviation: Sci Rep Subsets: MEDLINE
Imprint Name(s):
Original Publication: London : Nature Publishing Group, copyright 2011-
References:
Sci Rep. 2020 Sep 29;10(1):16057. (PMID: 32994452)
Diagnostics (Basel). 2022 Aug 14;12(8):. (PMID: 36010315)
IEEE Trans Neural Netw Learn Syst. 2019 Jan;30(1):109-122. (PMID: 29993587)
Ann Transl Med. 2016 Jan;4(1):9. (PMID: 26855945)
Sci Rep. 2024 Oct 16;14(1):24221. (PMID: 39414872)
Comput Biol Med. 2021 Sep;136:104672. (PMID: 34315030)
Contributed Indexing:
Keywords: BEES; Cardiovascular disease; CatBoost; Explainable AI; Machine learning
Entry Date(s):
Date Created: 20251213 Date Completed: 20251231 Latest Revision: 20260103
Update Code:
20260130
PubMed Central ID:
PMC12756275
DOI:
10.1038/s41598-025-28514-4
PMID:
41390781
Database:
MEDLINE

*Further Information*

*Cardiovascular disease (CVD) remains one of the leading causes of death worldwide, claiming millions of lives each year. The early detection of CVD enables healthcare professionals to make informed decisions about the patient's health. Machine learning (ML)- based frameworks have been extremely popular in predicting diseases. However, results generated from traditional ML models are "black-box," lacking transparency and interpretability. The objective of the present study is to develop an ML framework that detects CVD with promising accuracy and, further, provide interpretability to the generated outcomes to ensure targeted therapies. The Framingham, Massachusetts CVD dataset, which is publicly available from the Kaggle Repository, is used in this study. As part of the data pre-processing, the Random Oversampling (RO) technique is applied to overcome the data imbalance problem, followed by Pearson Correlation analysis to understand the correlation between attributes. Then, the Min-Max scaling technique is used for data normalization. The pre-processed data is fed into a hybrid ML framework incorporating the Categorical Boosting (CatBoost) and BEEs algorithms to achieve optimized CVD prediction results. The proposed Hybrid model yielded 98.04% accuracy, a Precision of 97.09%, a Recall of 98.96%, an F1-score of 98.02%, and a Specificity of 97.16%, with a total execution time of 26.6580 s. The proposed model outperformed contemporary state-of-the-art algorithms, considering most evaluation metrics. Additionally, Explainable Artificial Intelligence (XAI) techniques, such as LIME and SHAP, are implemented to identify the contribution of the most significant attributes towards the occurrence of CVD, offering valuable insights into the detection of the disease and enabling healthcare providers to make accurate and timely treatment decisions.
(© 2025. The Author(s).)*

*Declarations. Competing interests: The authors declare no competing interests.*