*Result*: Computer vision-based hybrid efficient convolution for isolated dynamic sign language recognition.

Title:
Computer vision-based hybrid efficient convolution for isolated dynamic sign language recognition.
Source:
Neural Computing & Applications; Nov2024, Vol. 36 Issue 32, p19951-19966, 16p
Database:
Complementary Index

*Further Information*

*Isolated dynamic sign language recognition (IDSLR) has the potential to change accessibility and inclusion by enabling speech and/or hearing-impaired people to engage more completely in a variety of spheres of life, including social interactions, work, and more. IDSLR is a challenging task due to considering a sequence of image frame analysis with multiple linguistic features for a single gesture in cluttered backgrounds and an illumination variation environment. We have proposed a Hybrid Efficient Convolution (HEC) model that ensembles EfficientNet-B3 and a few modified layers as an alternative to traditional machine learning techniques with improved performances in cluttered backgrounds with illumination variation environments. The architecture of the HCE integrates pre-trained layers of EfficientNet-B3 loaded with customized weights and a new custom dense layer featuring 256 units, followed by batch normalization, dropout, and the final output layer. To enhance the robustness of the system, we employed the augmentation technique during pre-processing. Then, the system executes channel-wise feature transformation through point-wise convolution that reduces the computational complexity and increases the accuracy. The updated dense layer with 256 units processes the output from the standard EfficientNet-B3, shaping the model into a hybrid form to achieve better performance. We have created our own gesture dataset, called "BdSL_OPA_23_GESTURES," which consists of 6000 video clips of 100 isolated dynamic Bangla Sign Language words, with 60 videos for each word from 20 different people in the cluttered background with illumination variation environments to train and evaluate the performances of the proposed model. We have considered 80% of the total dataset for training purpose, while the remaining 20% is dedicated to testing and validation. In a small number of epochs, our proposed HEC model achieves a superior accuracy of 93.17% on our created "BdSL_OPA_23_GESTURES" dataset. All the information of the proposed model with the dataset has been shared along with the scientific community to provide access publicly at: https://github.com/Prothoma2001/Bangla-Continuous-Sign-Language-Recognition.git. [ABSTRACT FROM AUTHOR]

Copyright of Neural Computing & Applications is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)*