*Result*: MS-CoTF: Multi-scale chain-of-thought fusion for interpretable biological reasoning with large language models.
Original Publication: New York, Pergamon Press.
*Further Information*
*Large language models (LLMs) have demonstrated impressive proficiency in various science and engineering applications. However, due to the innate multi-scale property of biological systems, existing LLMs face severe limitations in capturing hierarchical relationships and context-dependent interactions across molecular, cellular, tissue, and systemic levels. These models often lack the architectural mechanisms needed to reason effectively across different biological scales, resulting in reduced accuracy and limited interpretability when applied to complex tasks. Here, we introduce a novel framework named multi-scale chain-of-thought fusion (MS-CoTF), which fuses reasoning at molecular, cellular, tissue, and system scales to enhance accuracy and interpretability when solving biological tasks. Through adaptive reasoning depth control, multi-scale integration, bi-directional flow and dynamic fusion strategies, our MS-CoTF model effectively processes queries of varying complexity, enabling scalable and interpretable reasoning across multiple biological levels. Ablation studies demonstrate that these components function synergistically to enhance model accuracy while simultaneously providing biologically meaningful insights. Furthermore, our MS-CoTF model consistently outperforms state-of-the-art reasoning models by 10-15% across three benchmark problems and two case studies in terms of accuracy, expert ratings, and the capacity to produce reasonable inference chains. Technically, MS-CoTF orchestrates a frozen biomedical LLM backbone with trainable cross-scale modules, employing a precise definition of per-step chain-of-thought (CoT) construction and linking. To ensure rigorous evaluation, we implement an explicit dataset splitting protocol (entity-disjoint and temporal) and utilize the Reasoning Coherence Score strictly as a post-hoc metric to ensure fair comparisons. We further validate the framework through extended baselines, including structure-conditioned and multimodal biomedical LLMs, alongside detailed human evaluation protocols and hallucination stress tests.
(Copyright © 2026 Elsevier Ltd. All rights reserved.)*
*Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.*