*Result*: MedARC: Adaptive multi-agent refinement and collaboration for enhanced medical reasoning in large language models.

Title:
MedARC: Adaptive multi-agent refinement and collaboration for enhanced medical reasoning in large language models.
Authors:
Miao Y; School of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing, China; Jiangsu Key Laboratory of Intelligent Medical Image Computing, Nanjing University of Information Science and Technology, Nanjing, China., Wen J; School of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing, China., Luo Y; School of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing, China; Jiangsu Key Laboratory of Intelligent Medical Image Computing, Nanjing University of Information Science and Technology, Nanjing, China., Li J; School of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing, China; Jiangsu Key Laboratory of Intelligent Medical Image Computing, Nanjing University of Information Science and Technology, Nanjing, China. Electronic address: li.j@nuist.edu.cn.
Source:
International journal of medical informatics [Int J Med Inform] 2026 Feb; Vol. 206, pp. 106136. Date of Electronic Publication: 2025 Oct 13.
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: Elsevier Science Ireland Ltd Country of Publication: Ireland NLM ID: 9711057 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1872-8243 (Electronic) Linking ISSN: 13865056 NLM ISO Abbreviation: Int J Med Inform Subsets: MEDLINE
Imprint Name(s):
Original Publication: Shannon, Co. Clare, Ireland : Elsevier Science Ireland Ltd., c1997-
Contributed Indexing:
Keywords: Clinical natural language processing; Large language models; Medical question answering; Multi-agent system
Entry Date(s):
Date Created: 20251018 Date Completed: 20251124 Latest Revision: 20251124
Update Code:
20260130
DOI:
10.1016/j.ijmedinf.2025.106136
PMID:
41109093
Database:
MEDLINE

*Further Information*

*Background: Large Language Models (LLMs) have shown remarkable potential in medical question answering (QA), yet their deployment in clinical settings remains limited by hallucinations, inconsistent reasoning, and difficulties in handling complex biomedical information.
Methods: To address these challenges, we propose MedARC (Medical Agent Refinement and Collaboration), a novel multi-agent framework that enhances medical QA through structured debate among multiple LLM agents. MedARC introduces two key mechanisms: (1) structured inter-agent summarization to extract and refine key agreements and disagreements across agents, and (2) confidence-aware aggregation to synthesize final answers based on the most reliable and well-reasoned contributions.
Results: We evaluate MedARC on three representative medical QA benchmarks, spanning yes/no, factoid, and open-ended QA tasks. Experimental results demonstrate that MedARC significantly improves performance over zero-shot and chain-of-thought (CoT) prompting baselines. For instance, in the yes/no QA task on PubMedQA, MedARC boosts accuracy from 72.9 % (zero-shot) to 77.2 % using DeepSeek-V3 as the backbone model. Human evaluation further confirms gains in factual consistency and completeness with MedARC. We conduct extensive ablation studies and sensitivity analyses, revealing that MedARC's structured summarization and aggregation modules contribute independently to performance improvements. Additionally, we explore the impact of heterogeneous agent configurations and varying numbers of debate rounds on overall system effectiveness.
Conclusions: Overall, MedARC provides a reliable and scalable solution for enhancing LLM-based medical QA systems. The code for this study is publicly available on GitHub (https://github.com/asdmiao/MedARC).
(Copyright © 2025 Elsevier B.V. All rights reserved.)*

*Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.*