*Result*: Reproducing real-world clinical prediction models using the DIVE platform: A comparative validation study across three chronic diseases.
*Further Information*
*Objectives: The aim of this analysis is to evaluate the performance and reproducibility of the Python-based Data Insight Validation Engine (DIVE), a modular analytics interface implemented in Python to facilitate real-world evidence (RWE) generation from clinical (e.g. primary care) data. The platform was used to replicate three previously published studies focused on chronic kidney disease (CKD), chronic obstructive pulmonary disease (COPD), and severe asthma, each originally developed using conventional statistical environments.
Methods: Using a primary care data source, DIVE was employed to replicate three studies on development and validation of prediction scores using machine learning (ML) and traditional inferential analyses. Namely, a ML-based Generalized Additive<sup>2</sup> Model (GA<sup>2</sup>M) predicting CKD, and two Cox-based regression models for COPD exacerbations (CEX-HScore) and severe asthma (AS-HScore). Data referred to over one million patients under the care of approximately 800 general practitioners (GPs) in Italy. Although the initial studies were carried out between 2013 and 2021, the DIVE-based investigations extended from 2013 to 2022, thereby also demonstrating "external" temporal validation. Results obtained via DIVE were compared to the "original" prior findings.
Results: DIVE demonstrated high fidelity in replicating published results. The CKD model achieved largely consistent discrimination (AUC: 89.2% vs. 89.3%) and average precision (22.1% vs. 22.4%) using GA<sup>2</sup>M. The COPD model showed AUC of 65.5%, pseudo-R<sup>2</sup> of 12.7%, and calibration slope of 1.01 (p = 0.317) which were consistent with original CEX-HScore (AUC: 66%; pseudo-R<sup>2</sup>: 13%; calibration slope: 1.03 (p = 0.345)). For severe asthma, the prediction model exhibited an AUC equals to 71.9%, pseudo-R<sup>2</sup> of 17.6%, and calibration slope of 1.09 (p = 0.211), still aligned with the original AS-HScore (AUC: 72.5%; pseudo-R<sup>2</sup>: 18%; calibration slope: 1.12 (p = 0.182)).
Conclusion: DIVE represents a reliable, scalable, and interoperable solution for RWE analytics, demonstrating equivalence with traditional analytic methods and aligning with best practices in data reproducibility. Continued development toward integrating federated (multi-database) analyses protocols and broader interoperability might expand its utility across several clinical domains.
(Copyright © 2026 Elsevier B.V. All rights reserved.)*
*Declaration of competing interest The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: FL, EM, and IC provided consultations in protocol preparation for epidemiological studies and data analyses for AstraZeneca, Boehringer Ingelheim, GSK and Chiesi. GM provided clinical consultancies for AstraZeneca, Boehringer Ingelheim, Novo Nordisk, GSK, and Chiesi. MG is an employee at AstraZeneca.*