*Result*: Atrous spatial pyramid pooling with swin transformer model for classification of gastrointestinal tract diseases from videos with enhanced explainability.
*Further Information*
*Accurate and early identification of gastrointestinal (GI) lesions is crucial for treating and preventing GI diseases, including cancer. Automated computer-aided diagnosis methods can assist physicians in early and accurate detection. Video classification of GI endoscopic videos is challenging due to the complexity and variability of visual data. This research proposes a novel method for classifying GI diseases using endoscopic videos. Leveraging the public HyperKvasir dataset, we applied preprocessing algorithms to enhance GI frames by removing noise and artifacts with morphological opening and closing techniques, ensuring high-quality visuals. We addressed dataset imbalance by proposing a novel algorithm. Our hybrid model, Atrous Spatial Pyramid Pooling with Swin Transformer (ASPPST), combines advanced Convolutional Neural Networks and the Swin Transformer to classify GI videos into 30 distinct classes. We incorporated Gradient-Class Activation Mapping (Grad-CAM) in ASPPST's final layer to improve model explainability. The proposed model achieved 97.49 % accuracy in classifying 30 GI diseases, outperforming other transfer learning models and transformers by 8.04 % and 3.99 %, respectively. It also demonstrated a precision of 97.80 %, recall of 97.77 %, and an F1 score of 97.75 %, showcasing robustness across metrics. The high accuracy of ASPPST makes it suitable for real-world use, delivering fewer errors and more precise results in GI endoscopy video classification. Our approach advances artificial intelligence (AI) in computer vision and deep learning for biomedical engineering applications. Grad-CAM integration enhances transparency, boosting clinician trust and adoption of AI tools in diagnostic workflows. • Efficient preprocessing methods were employed to remove noise and artifacts, enabling the extraction of significant features. • To address data imbalance, an under-sampling strategy was proposed. • A highly accurate hybrid deep learning model was developed to classify gastrointestinal diseases into 30 distinct classes. [ABSTRACT FROM AUTHOR]*