*Result*: A flexible model for record linkage.

Title:
A flexible model for record linkage.
Authors:
Robach, Kayané1,2 (AUTHOR), Pas, Stéphanie L van der1,2 (AUTHOR), Wiel, Mark A van de1,2 (AUTHOR), Hof, Michel H1,2 (AUTHOR)
Source:
Journal of the Royal Statistical Society: Series C (Applied Statistics). Nov2025, Vol. 74 Issue 4, p1100-1127. 28p.
Database:
Business Source Premier

*Further Information*

*Combining data from various sources empowers researchers to explore innovative questions, for example those raised by conducting healthcare monitoring studies. However, the lack of a unique identifier often poses challenges. Record linkage procedures determine whether pairs of observations collected on different occasions belong to the same individual using partially identifying variables (e.g. birth year, postal code). Existing methodologies typically involve a compromise between computational efficiency and accuracy. Traditional approaches simplify this task by condensing information, yet they neglect dependencies among linkage decisions and disregard the one-to-one relationship required to establish coherent links. Modern approaches offer a comprehensive representation of the data generation process, at the expense of computational overhead and reduced flexibility. We propose a flexible method, that adapts to varying data complexities, addressing registration errors and accommodating changes of the identifying information over time. Our approach balances accuracy and scalability, estimating the linkage using a Stochastic Expectation Maximization algorithm on a latent variable model. We illustrate the ability of our methodology to connect observations using large real data applications and demonstrate the robustness of our model to the linking variables quality in a simulation study. The proposed algorithm FlexRL is implemented and available in an open source R package. [ABSTRACT FROM AUTHOR]

Copyright of Journal of the Royal Statistical Society: Series C (Applied Statistics) is the property of Oxford University Press / USA and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)*