*Result*: MedARC: Adaptive multi-agent refinement and collaboration for enhanced medical reasoning in large language models.
*Further Information*
*Background: Large Language Models (LLMs) have shown remarkable potential in medical question answering (QA), yet their deployment in clinical settings remains limited by hallucinations, inconsistent reasoning, and difficulties in handling complex biomedical information.
Methods: To address these challenges, we propose MedARC (Medical Agent Refinement and Collaboration), a novel multi-agent framework that enhances medical QA through structured debate among multiple LLM agents. MedARC introduces two key mechanisms: (1) structured inter-agent summarization to extract and refine key agreements and disagreements across agents, and (2) confidence-aware aggregation to synthesize final answers based on the most reliable and well-reasoned contributions.
Results: We evaluate MedARC on three representative medical QA benchmarks, spanning yes/no, factoid, and open-ended QA tasks. Experimental results demonstrate that MedARC significantly improves performance over zero-shot and chain-of-thought (CoT) prompting baselines. For instance, in the yes/no QA task on PubMedQA, MedARC boosts accuracy from 72.9 % (zero-shot) to 77.2 % using DeepSeek-V3 as the backbone model. Human evaluation further confirms gains in factual consistency and completeness with MedARC. We conduct extensive ablation studies and sensitivity analyses, revealing that MedARC's structured summarization and aggregation modules contribute independently to performance improvements. Additionally, we explore the impact of heterogeneous agent configurations and varying numbers of debate rounds on overall system effectiveness.
Conclusions: Overall, MedARC provides a reliable and scalable solution for enhancing LLM-based medical QA systems. The code for this study is publicly available on GitHub (https://github.com/asdmiao/MedARC).
(Copyright © 2025 Elsevier B.V. All rights reserved.)*
*Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.*