Treffer: LPDiag: LLM-Enhanced Multimodal Prototype Learning Framework for Intelligent Tomato Leaf Disease Diagnosis.
Weitere Informationen
Tomato leaf diseases exhibit subtle inter-class differences and substantial intra-class variability, making accurate identification challenging for conventional deep learning models, especially under real-world conditions with diverse lighting, occlusion, and growth stages. Moreover, most existing approaches rely solely on visual features and lack the ability to incorporate semantic descriptions or expert knowledge, limiting their robustness and interpretability. To address these issues, we propose LPDiag, a multimodal prototype-attention diagnostic framework that integrates large language models (LLMs) for fine-grained recognition of tomato diseases. The framework first employs an LLM-driven semantic understanding module to encode symptom-aware textual embeddings from disease descriptions. These embeddings are then aligned with multi-scale visual features extracted by an enhanced Res2Net backbone, enabling cross-modal representation learning. A set of learnable prototype vectors, combined with a knowledge-enhanced attention mechanism, further strengthens the interaction between visual patterns and LLM prior knowledge, resulting in more discriminative and interpretable representations. Additionally, we develop an interactive diagnostic system that supports natural-language querying and image-based identification, facilitating practical deployment in heterogeneous agricultural environments. Extensive experiments on three widely used datasets demonstrate that LPDiag achieves a mean accuracy of 98.83%, outperforming state-of-the-art models while offering improved explanatory capability. The proposed framework offers a promising direction for integrating LLM-based semantic reasoning with visual perception to enhance intelligent and trustworthy plant disease diagnostics. [ABSTRACT FROM AUTHOR]
Copyright of Agriculture; Basel is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)