Result: Automated detection of stigmatizing language in Electronic Health Records (EHRs) using a multi-stage transfer learning approach.

Title:

Automated detection of stigmatizing language in Electronic Health Records (EHRs) using a multi-stage transfer learning approach.

Authors:

Xue L; Department of Library and Information Science, School of Communication & Information, Rutgers University, New Brunswick, NJ, 08901, United States., Rahman AMM; Department of Library and Information Science, School of Communication & Information, Rutgers University, New Brunswick, NJ, 08901, United States., Senteio CR; Department of Library and Information Science, School of Communication & Information, Rutgers University, New Brunswick, NJ, 08901, United States., Singh VK; Department of Library and Information Science, School of Communication & Information, Rutgers University, New Brunswick, NJ, 08901, United States.

Source:

Journal of the American Medical Informatics Association : JAMIA [J Am Med Inform Assoc] 2026 Feb 01; Vol. 33 (2), pp. 283-294.

Publication Type:

Journal Article

Language:

English

Journal Info:

Publisher: Oxford University Press Country of Publication: England NLM ID: 9430800 Publication Model: Print Cited Medium: Internet ISSN: 1527-974X (Electronic) Linking ISSN: 10675027 NLM ISO Abbreviation: J Am Med Inform Assoc Subsets: MEDLINE

Imprint Name(s):

Publication: 2015- : Oxford : Oxford University Press
Original Publication: Philadelphia, PA : Hanley & Belfus, c1993-

MeSH Terms:

Electronic Health Records* , Machine Learning* , Language* , Natural Language Processing* , Social Stigma*, Humans ; Semantics ; Support Vector Machine

Contributed Indexing:

Keywords: electronic health records; large language models (LLMs); natural language processing; stigmatizing language; transfer learning

Entry Date(s):

Date Created: 20251109 Date Completed: 20260127 Latest Revision: 20260129

Update Code:

20260130

PubMed Central ID:

PMC12844570

DOI:

10.1093/jamia/ocaf193

PMID:

41206907

Database:

MEDLINE

Further Information

*Objective: Stigmatizing language (SL) in Electronic Health Records (EHRs) can perpetuate biases and negatively impact patient care. This study introduces a novel method for automatically detecting such language to improve healthcare documentation practices.
Materials and Methods: We developed a multi-stage transfer learning framework integrating semantic, syntactic, and task adaptation using three datasets: hate speech, clinical phenotypes, and stigmatizing language. Experiments were conducted on stigmatizing language dataset which consists of 4,129 de-identified EHR notes (72.7% stigmatizing, 27.3% non-stigmatizing), split 80/20 for training and testing. Longformer, BERT, and ClinicalBERT models were evaluated, and model performance was assessed on 35 randomized subsets of the test set (each comprising 70% of test data). The Wilcoxon-Mann-Whitney test was used to evaluate statistical significance, with Bonferroni correction applied to control for multiple hypothesis testing. Baseline models included zero-shot and few-shot GPT-4o, Support Vector Machine, Random Forest, Logistic Regression, and Multinomial Naive Bayes.
Results: The proposed framework achieved the highest accuracy, with fully adapted Longformer reaching 89.83%. Performance improvements remained statistically significant after Bonferroni correction compared to all baselines (p < .05). The framework demonstrated robust gains across different stigmatizing language types.
Discussion: This study underscores the value of domain-adaptive NLP for detecting stigmatizing language in EHRs. The multi-stage transfer learning framework effectively captures subtle biases often missed by conventional models, enabling more objective and respectful clinical documentation.
Conclusion: This framework offers a statistically validated, high-performing framework for detecting stigmatizing language in EHRs, supporting responsible AI and promoting equity in clinical care.
(© The Author(s) 2025. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For commercial re-use, please contact reprints@oup.com for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact journals.permissions@oup.com.)*

*Result*: Automated detection of stigmatizing language in Electronic Health Records (EHRs) using a multi-stage transfer learning approach.

*Further Information*

*Links*

*Additional functions*

Result: Automated detection of stigmatizing language in Electronic Health Records (EHRs) using a multi-stage transfer learning approach.

Further Information

Links

Additional functions