*Result*: Large Language Model Agent for Modular Task Execution in Drug Discovery.

Title:
Large Language Model Agent for Modular Task Execution in Drug Discovery.
Authors:
Ock J; Department of Chemical Engineering, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, United States.; Department of Chemical and Biomolecular Engineering, University of Nebraska─Lincoln, Lincoln, Nebraska 68588, United States., Meda RS; Department of Chemical Engineering, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, United States., Badrinarayanan S; Department of Chemical Engineering, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, United States., Aluru NS; School of Engineering Medicine, Texas A&M University, Houston, Texas 77030, United States., Chandrasekhar A; Department of Material Science and Engineering, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, United States., Barati Farimani A; Department of Mechanical Engineering, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, United States.
Source:
Journal of chemical information and modeling [J Chem Inf Model] 2026 Feb 23; Vol. 66 (4), pp. 2055-2068. Date of Electronic Publication: 2026 Feb 09.
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: American Chemical Society Country of Publication: United States NLM ID: 101230060 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1549-960X (Electronic) Linking ISSN: 15499596 NLM ISO Abbreviation: J Chem Inf Model Subsets: MEDLINE
Imprint Name(s):
Original Publication: Washington, D.C. : American Chemical Society, c2005-
Substance Nomenclature:
0 (Ligands)
0 (Proteins)
Entry Date(s):
Date Created: 20260209 Date Completed: 20260223 Latest Revision: 20260227
Update Code:
20260227
PubMed Central ID:
PMC12933718
DOI:
10.1021/acs.jcim.5c02454
PMID:
41662220
Database:
MEDLINE

*Further Information*

*We present a modular framework powered by large language models (LLMs) that automates and streamlines key tasks across the early stage computational drug discovery pipeline. By combining LLM reasoning with domain-specific tools, the framework performs biomedical data retrieval, literature-grounded question answering via retrieval-augmented generation, molecular generation, multiproperty prediction, property-aware molecular refinement, and 3D protein-ligand structure generation. The agent autonomously retrieves relevant biomolecular information, including FASTA sequences, SMILES representations, and literature, and answers mechanistic questions with improved contextual accuracy compared to standard LLMs. It then generates chemically diverse seed molecules and predicted 75 properties, including ADMET-related and general physicochemical descriptors, which guids iterative molecular refinement. Across two refinement rounds, the number of molecules with QED >0.6 increased from 34 to 55. The number of molecules satisfying empirical drug-likeness filters also rose; for example, compliance with the Ghose filter increased from 32 to 55 within a pool of 100 molecules. The framework also employed Boltz-2 to generate 3D protein-ligand complexes and provide rapid binding affinity estimates for candidate compounds. These results demonstrate that the approach effectively supports molecular screening, prioritization, and structure evaluation. Its modular design enables flexible integration of evolving tools and models, providing a scalable foundation for AI-assisted therapeutic discovery.*