*Result*: Statistical modelling of an outcome variable with integrated multi-omics.

Title:
Statistical modelling of an outcome variable with integrated multi-omics.
Authors:
Li H; Department of Mathematics, Radboud University, Heyendaalseweg, 6525 AJ, Nijmegen, Gelderland, The Netherlands. he.li@ru.nl.; Department of Mathematics and Computer Science, Eindhoven University of Technology, De Groene Loper, North Brabant, 5612 AE, Eindhoven, The Netherlands. he.li@ru.nl., Gu Z; Medical Research Council Biostatistics Unit, University of Cambridge, Robinson Way, Cambridge, Cambridgeshire, CB2 0SR, UK., El Bouhaddani S; Julius Centre, UMC Utrecht, Universiteitsweg, 3584 CG, Utrecht, Utrecht, The Netherlands.; Population Health Research, King Abdullah International Medical Research Center, Mecca, 22384, Jeddah, Saudi Arabia.; King Saud bin Abdulaziz University for Health Sciences, Mecca, 22384, Jeddah, Saudi Arabia., Houwing-Duistermaat J; Department of Mathematics, Radboud University, Heyendaalseweg, 6525 AJ, Nijmegen, Gelderland, The Netherlands.; Department of Mathematics and Computer Science, Eindhoven University of Technology, De Groene Loper, North Brabant, 5612 AE, Eindhoven, The Netherlands.; Department of Statistics, University of Leeds, Woodhouse Lane, Leeds, West Yorkshire, LS2 9JT, UK.
Source:
BMC bioinformatics [BMC Bioinformatics] 2025 Dec 24; Vol. 27 (1), pp. 26. Date of Electronic Publication: 2025 Dec 24.
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: BioMed Central Country of Publication: England NLM ID: 100965194 Publication Model: Electronic Cited Medium: Internet ISSN: 1471-2105 (Electronic) Linking ISSN: 14712105 NLM ISO Abbreviation: BMC Bioinformatics Subsets: MEDLINE
Imprint Name(s):
Original Publication: [London] : BioMed Central, 2000-
References:
Twin Res Hum Genet. 2019 Dec;22(6):523-529. (PMID: 31526404)
Proc Natl Acad Sci U S A. 2016 Apr 19;113(16):4252-9. (PMID: 27036001)
Nat Commun. 2020 Jan 7;11(1):39. (PMID: 31911595)
Bioinformatics. 2018 Mar 15;34(6):1009-1015. (PMID: 29077792)
J Neuroinflammation. 2024 Sep 26;21(1):234. (PMID: 39327581)
Brief Bioinform. 2022 Jan 17;23(1):. (PMID: 34791014)
J Appl Stat. 2024 Feb 21;51(13):2627-2651. (PMID: 39290359)
Bioinformatics. 2014 Dec 1;30(23):3427-9. (PMID: 25150247)
BMC Bioinformatics. 2016 Jan 20;17 Suppl 2:11. (PMID: 26822911)
Comput Struct Biotechnol J. 2021 Jun 22;19:3735-3746. (PMID: 34285775)
BMC Bioinformatics. 2018 Oct 11;19(1):371. (PMID: 30309317)
Stat Appl Genet Mol Biol. 2008;7(1):Article 35. (PMID: 19049491)
Aging (Albany NY). 2022 Jan 24;14(2):623-659. (PMID: 35073279)
Nat Genet. 2015 Sep;47(9):1091-8. (PMID: 26258848)
Bioinformatics. 2010 Nov 15;26(22):2867-73. (PMID: 20926424)
Am J Hum Genet. 2008 Sep;83(3):359-72. (PMID: 18760389)
Circ Res. 2018 May 25;122(11):1555-1564. (PMID: 29535164)
Am J Hum Genet. 2007 Sep;81(3):559-75. (PMID: 17701901)
Twin Res Hum Genet. 2006 Dec;9(6):899-906. (PMID: 17254428)
Nat Commun. 2016 Mar 23;7:11122. (PMID: 27005778)
Nat Protoc. 2020 Sep;15(9):2759-2772. (PMID: 32709988)
BMC Bioinformatics. 2025 Aug 19;26(1):214. (PMID: 40830833)
Curr Protoc. 2024 Feb;4(2):e981. (PMID: 38314955)
Pac Symp Biocomput. 2018;23:448-459. (PMID: 29218904)
Commun Biol. 2022 Jun 30;5(1):645. (PMID: 35773471)
Grant Information:
United Kingdom WT_ Wellcome Trust; 40-44000-98-2006 / 90030376507 ERA-Net E-Rare JTC 2018 (MSA-omics); 721815 European Union's Horizon 2020 research and innovation programme (IMforFUTURE)
Contributed Indexing:
Keywords: Latent variables; Low-dimensional representation; Metabolomics; Multivariate analysis; Polygenic score
Entry Date(s):
Date Created: 20251224 Date Completed: 20260131 Latest Revision: 20260203
Update Code:
20260203
PubMed Central ID:
PMC12859906
DOI:
10.1186/s12859-025-06349-0
PMID:
41444512
Database:
MEDLINE

*Further Information*

*Background: In studies that aim to model the relationship between an outcome variable and multiple omics datasets, it is often desirable to reduce the dimensionality of these datasets or to represent one omics dataset in terms of another. Several approaches exist for this purpose, including univariate methods such as polygenic scores, and multivariate methods. Multivariate approaches offer advantages by producing lower-dimensional integrative scores, capturing joint structures across datasets, and filtering out dataset-specific noise. In this paper, we describe one univariate and two multivariate methods, and evaluate their performance through simulations involving two correlated multivariate normally distributed omics datasets, as well as a combination of one multivariate normal and one fixed categorical dataset.
Results: We assess method performance using the root mean squared error (RMSE) when modelling the outcome variable as a function of the reduced omics representations. Multivariate methods generally perform well, particularly when a slightly higher number of components is used for integration. They outperform the univariate method in scenarios involving two normally distributed omics datasets and perform comparably in settings with one normal and one categorical dataset. In real data applications, including two metabolomics datasets from TwinsUK and a metabolomics-genetic dataset from ORCADES, all methods show similar performance in modelling body mass index.
Conclusions: Multivariate methods provide a valuable framework for summarizing multi-omics datasets into low-dimensional components suitable for outcome modelling. Even in the presence of non-normal data, these methods offer a promising alternative to high-dimensional univariate approaches.
(© 2025. The Author(s).)*

*Declarations. Ethics approval and consent to participate: Not applicable. Consent for publications: Not applicable. Competing interests: ZG is currently an employee of Novartis Pharmaceuticals UK, but all work presented in this manuscript was completed while he was an employee of University of Cambridge and UMC Utrecht.*