Result: Leveraging large language models for heuristic usability assessment of medical software: Insights with the Radiation Planning Assistant.

Title:

Leveraging large language models for heuristic usability assessment of medical software: Insights with the Radiation Planning Assistant.

Authors:

Court LE; Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA., Smit J; Department of Medical Physics, University of the Free State School of Medicine, Bloemfontein, South Africa., Strauss L; Department of Medical Physics, University of the Free State School of Medicine, Bloemfontein, South Africa., Shaw W; Department of Medical Physics, University of the Free State School of Medicine, Bloemfontein, South Africa., Marais A; Division of Medical Physics, Tygerberg Hospital, Stellenbosch University, Cape Town, South Africa., Trauernicht C; Division of Medical Physics, Tygerberg Hospital, Stellenbosch University, Cape Town, South Africa., Joubert N; Department of Radiation Medicine, Groote Schuur Hospital, University of Cape Town, Cape Town, South Africa., Smith E; Department of Radiation Medicine, Groote Schuur Hospital, University of Cape Town, Cape Town, South Africa., Badre S; Department of Radiation Oncology, Inkosi Albert Luthuli Central Hospital, Durban, South Africa., Lazarus GL; Department of Radiation Oncology, Inkosi Albert Luthuli Central Hospital, Durban, South Africa., Khotle T; Department of Radiation Oncology, Charlotte Maxeke Johannesburg Academic Hospital, Wits University, Johannesburg, South Africa., Netherton L; Houston, Texas, USA., van Heerden W; Icon Oncology, Johannesburg, South Africa., Cardenas C; Department of Radiation Oncology, University of Alabama at Birmingham, Birmingham, Alabama, USA., Serban M; Department of Radiation Oncology, Princess Margaret Cancer Centre, Toronto, Canada., Seuntjens J; Department of Radiation Oncology, Princess Margaret Cancer Centre, Toronto, Canada., Chung CV; Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA., Govyadinov P; Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA., Khan M; Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA., Nair S; Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA., Netherton T; Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA., Zhang L; Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA.

Source:

Journal of applied clinical medical physics [J Appl Clin Med Phys] 2026 Feb; Vol. 27 (2), pp. e70495.

Publication Type:

Journal Article

Language:

English

Journal Info:

Publisher: Wiley on behalf of American Association of Physicists in Medicine Country of Publication: United States NLM ID: 101089176 Publication Model: Print Cited Medium: Internet ISSN: 1526-9914 (Electronic) Linking ISSN: 15269914 NLM ISO Abbreviation: J Appl Clin Med Phys Subsets: MEDLINE

Imprint Name(s):

Publication: 2017- : Malden, MA : Wiley on behalf of American Association of Physicists in Medicine
Original Publication: Reston, VA : American College of Medical Physics, c2000-

MeSH Terms:

Software*/standards , Radiotherapy Planning, Computer-Assisted*/methods , Neoplasms*/radiotherapy , Heuristics* , User-Computer Interface* , Programming Languages*, Humans ; Large Language Models

References:

62366‐2: I. Medical devices—Part 2: Guidance on the application of usability engineering to medical devices. International Organization for Standardization.
Privitera MB, Evans M, Southee D. Human factors in the design of medical devices—Approaches to meeting international standards in the European Union and USA. Appl Ergon. 2017;59:251‐263. doi:10.1016/j.apergo.2016.08.034.
van der Peijl J, Klein J, Grass C, Freudenthal A. Design for risk control: the role of usability engineering in the management of use‐related risks. J Biomed Inform. 2012;45(4):795‐812. doi:10.1016/j.jbi.2012.03.006.
Tase A, Vadhwana B, Buckle P, Hanna GB. Usability challenges in the use of medical devices in the home environment: a systematic review of literature. Appl Ergon. 2022;103:103769. doi:10.1016/j.apergo.2022.103769.
Cardan RA, Covington EL, Popple RA. Code Wisely: risk assessment and mitigation for custom clinical software. J Appl Clin Med Phys. 2021;22(8):273‐279. doi:10.1002/acm2.13348.
Salomons GJ, Kelly D. A survey of Canadian medical physicists: software quality assurance of in‐house software. J Appl Clin Med Phys. 2015;16(1):336‐348. doi:10.1120/jacmp.v16i1.5115.
Cha E, Elguindi S, Onochie I, et al. Clinical implementation of deep learning contour autosegmentation for prostate radiotherapy. Radiother Oncol. 2021;159:1‐7. doi:10.1016/j.radonc.2021.02.040.
Zhang J, Johnson TR, Patel VL, Paige DL, Kubose T. Using usability heuristics to evaluate patient safety of medical devices. J Biomed Inform. 2003;36(1):23‐30. doi:10.1016/S1532‐0464(03)00060‐1.
Jiang M, Liu S, Gao J, Feng Q, Zhang Q. A usability study of 3 radiotherapy systems: a comparative evaluation based on expert evaluation and user experience. Med Sci Monit. 2019;25:578‐589. doi:10.12659/msm.913160.
Chan AJ, Islam MK, Rosewall T, Jaffray DA, Easty AC, Cafazzo JA. Applying usability heuristics to radiotherapy systems. Radiother Oncol. 2012;102(1):142‐147. doi:10.1016/j.radonc.2011.05.077.
Shier AP, Morita PP, Dickie C, Islam M, Burns CM, Cafazzo JA. Design and evaluation of a safety‐centered user interface for radiation therapy. Practical Radiat Oncol. 2018;8(5):e346‐e354. doi:10.1016/j.prro.2018.01.009.
Jiang M, Tu X, Xiao W, et al. Usability testing of radiotherapy systems as a medical device evaluation tool to inform hospital procurement decision‐making. Sci Prog. 2021;104(3):368504211036129. doi:10.1177/00368504211036129.
Gilmore D, Shier A. Usability engineering for a complex, medical device: a case study of an MR‐Linac. 2019.
Yang W, Some L, Bain M, Kang B. A comprehensive survey on integrating large language models with knowledge‐based methods. Knowledge‐Based Systems. 2025;318:113503. doi:10.1016/j.knosys.2025.113503.
Maity S, Saikia MJ. Large language models in healthcare and medical applications: a review. Bioengineering (Basel). 2025;12(6). doi:10.3390/bioengineering12060631.
Jang BS, Alcorn SR, McNutt TR, Ehsan U. Hype or reality: utility of large language models in radiation oncology. Int J Radiat. Oncol.*Biol.*Phys. 2024;120(2, Supplement):e629‐e630. doi:10.1016/j.ijrobp.2024.07.1386.
Zitu MM, Le TD, Duong T, et al. Large language models in cancer: potentials, risks, and safeguards. BJR Artif Intell. 2025;2(1):ubae019. doi:10.1093/bjrai/ubae019.
Court LE, Aggarwal A, Burger H, et al. Radiation planning assistant—a web‐based tool to support high‐quality radiotherapy in clinics with limited resources. J Vis Exp. 2023;200(200):e65504. doi:10.3791/65504.
Nealon KA, Balter PA, Douglas RJ, et al. Using failure mode and effects analysis to evaluate risk in the clinical adoption of automated contouring and treatment planning tools. Practical Radiat Oncol. 2022;12(4):e344‐e353. doi:10.1016/j.prro.2022.01.003.
Nealon KA, Douglas RJ, Han EY, et al. Hazard testing to reduce risk in the development of automated planning tools. J Appl Clin Med Phys. 2023;24(8):e13995. doi:10.1002/acm2.13995.
Kisling K, Johnson JL, Simonds H, et al. A risk assessment of automated treatment planning and recommendations for clinical deployment. Med Phys. 2019;46(6):2567‐2574. doi:10.1002/mp.13552.
Court L, Aggarwal A, Burger H, et al. Addressing the global expertise gap in radiation oncology: the radiation planning assistant. JCO Global Oncol. 2023(9):e2200431. doi:10.1200/go.22.00431.
Court LE. The radiation planning assistant: addressing the global gap in radiotherapy services. Lancet Oncol. 2024;25(3):277‐278. doi:10.1016/S1470‐2045(24)00084‐6.
Court LE, Aggarwal A, Jhingran A, et al. Artificial intelligence‐based radiotherapy contouring and planning to improve global access to cancer care. JCO Glob Oncol. 2024;10:e2300376. doi:10.1200/GO.23.00376.
Chan AJ, Islam MK, Rosewall T, Jaffray DA, Easty AC, Cafazzo JA. Applying usability heuristics to radiotherapy systems. Radiother Oncol. 2012;102(1):142‐147. doi:10.1016/j.radonc.2011.05.077.

Contributed Indexing:

Keywords: Heuristic evaluation; Large language models; Usability; User interface design

Entry Date(s):

Date Created: 20260218 Date Completed: 20260218 Latest Revision: 20260221

Update Code:

20260221

PubMed Central ID:

PMC12916175

DOI:

10.1002/acm2.70495

PMID:

41708070

Database:

MEDLINE

Further Information

*Background: Usability engineering is essential for ensuring the safety and effectiveness of medical software, as design-related issues are a leading cause of use errors in clinical settings. Heuristic evaluation provides a practical approach to identifying usability problems, but its outcomes depend heavily on expert interpretation. Large Language Models (LLMs), such as ChatGPT, offer a potential means to augment heuristic evaluation by generating structured, context-aware usability feedback. This study explored the use of ChatGPT to support heuristic assessment of the Radiation Planning Assistant (RPA), a web-based radiotherapy planning tool designed to support clinical teams in low- and middle-income countries.
Methods: ChatGPT was provided with the RPA user and technical guides, training videos for each functional dashboard, and Zhang et al.'s 14 usability heuristics. The model was instructed to score each dashboard according to these heuristics, using Zhang's 0-4 severity scale, and to propose concrete interface improvements. The resulting feedback was reviewed and scored independently by the RPA developer team and by 13 users during a dedicated User Meeting. Comparative analysis was performed between ChatGPT, developer, and user ratings.
Results: ChatGPT identified 26 potential usability issues across six heuristic domains. The developer team considered nine of these actionable, though all were classified as minor (severity ≤ 2). User ratings showed wide variability, with nine suggestions achieving mean scores ≥ 1.5. Qualitative agreement between users and developers was limited, underscoring the importance of diverse perspectives in heuristic evaluation. Three suggestions-enhanced upload logs, reversible actions ("reopen request"), and stronger error prevention-were rated as potentially high priority by a minority of users. ChatGPT's ratings were consistent across dashboards.
Conclusions: While ChatGPT did not reveal any critical usability failures, its heuristic assessment proved valuable in prompting discussion, identifying minor refinements, and enriching both developer and user engagement with the RPA's interface design. This study demonstrates that LLMs can serve as an effective, low-cost complement to conventional heuristic evaluation, supporting early-stage usability review and stakeholder dialogue in the development of medical software.
(© 2026 The Author(s). Journal of Applied Clinical Medical Physics published by Wiley Periodicals LLC on behalf of American Association of Physicists in Medicine.)*

*Result*: Leveraging large language models for heuristic usability assessment of medical software: Insights with the Radiation Planning Assistant.

*Further Information*

*Links*

*Additional functions*

Result: Leveraging large language models for heuristic usability assessment of medical software: Insights with the Radiation Planning Assistant.

Further Information

Links

Additional functions