The use of omic modalities to dissect the molecular underpinnings of common diseases and traits is becoming increasingly common. But multi-omic traits can be genetically predicted, which enables highly cost-effective and powerful analyses for studies that do not have multi-omics1. Here we examine a large cohort (the INTERVAL study2; n = 50,000 participants) with extensive multi-omic data for plasma proteomics (SomaScan, n = 3,175; Olink, n = 4,822), plasma metabolomics (Metabolon HD4, n = 8,153), serum metabolomics (Nightingale, n = 37,359) and whole-blood Illumina RNA sequencing (n = 4,136), and use machine learning to train genetic scores for 17,227 molecular traits, including 10,521 that reach Bonferroni-adjusted significance. We evaluate the performance of genetic scores through external validation across cohorts of individuals of European, Asian and African American ancestries. In addition, we show the utility of these multi-omic genetic scores by quantifying the genetic control of biological pathways and by generating a synthetic multi-omic dataset of the UK Biobank3 to identify disease associations using a phenome-wide scan. We highlight a series of biological insights with regard to genetic mechanisms in metabolism and canonical pathway associations with disease; for example, JAK–STAT signalling and coronary atherosclerosis. Finally, we develop a portal (https://www.omicspred.org/) to facilitate public access to all genetic scores and validation results, as well as to serve as a platform for future extensions and enhancements of multi-omic genetic scores.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Biomarker Research Open Access 19 July 2023
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
All of the genetic-score models trained in this study and GWAS summary statistics used to develop genetic scores are publicly accessible through the OmicsPred portal (https://www.omicspred.org/) under accession codes OPGS000001–OPGS017227. INTERVAL study data from this paper are available to bona fide researchers from email@example.com and information, including the data access policy, is available at http://www.donorhealth-btru.nihr.ac.uk/project/bioresource.
The original codes used to train the genetic scores with INTERVAL data, internally validate these scores and benchmark the performance of different genetic-score construction methods are available at https://github.com/xuyu-cam/atlas_genetic_scores_omic_traits.
Barbeira, A. N. et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 9, 1825 (2018).
Moore, C. et al. The INTERVAL trial to determine whether intervals between blood donations can be safely and acceptably decreased to optimise blood supply: study protocol for a randomised controlled trial. Trials 15, 363 (2014).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Ritchie, S. C. et al. Integrative analysis of the plasma proteome and polygenic risk of cardiometabolic diseases. Nat. Metab. 3, 1476–1483 (2021).
Lambert, S. A. et al. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat. Genet. 53, 420–425 (2021).
Adeyemo, A. et al. Responsible use of polygenic risk scores in the clinic: potential benefits, risks and gaps. Nat. Med. 27, 1876–1884 (2021).
Xu, Y. et al. Machine learning optimized polygenic scores for blood cell traits identify sex-specific trajectories and genetic correlations with disease. Cell Genomics 2, 100086 (2022).
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).
Mosley, J. D. et al. Probing the virtual proteome to identify novel disease biomarkers. Circulation 138, 2469–2481 (2018).
Hutcheon, J. A., Chiolero, A. & Hanley, J. A. Random measurement error and regression dilution bias. Br. Med. J. 340, 1402–1406 (2010).
Pividori, M., Schoettler, N., Nicolae, D. L., Ober, C. & Im, H. K. Shared and distinct genetic risk factors for childhood-onset and adult-onset asthma: genome-wide and transcriptome-wide studies. Lancet Respir. Med. 7, 509–522 (2019).
Lannelongue, L., Grealey, J., Bateman, A. & Inouye, M. Ten simple rules to make your computing more environmentally sustainable. PLoS Comput. Biol. 17, e1009324 (2021).
Berisa, T. & Pickrell, J. K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283–285 (2016).
Pietzner, M. et al. Mapping the proteo-genomic convergence of human diseases. Science 374, eabj1541 (2021).
Igl, W., Johansson, A. & Gyllensten, U. The Northern Swedish Population Health Study (NSPHS)—a paradigmatic study in a rural population combining community health and basic research. Rural Remote Health 10, 1363 (2010).
McQuillan, R. et al. Runs of homozygosity in European populations. Am. J. Hum. Genet. 83, 359 (2008).
Kerr, S. M. et al. An actionable KCNH2 Long QT Syndrome variant detected by sequence and haplotype analysis in a population research cohort. Sci. Rep. 9, 10964 (2019).
Tan, K. H. X. et al. Cohort profile: the Singapore Multi-Ethnic Cohort (MEC) study. Int. J. Epidemiol. 47, 699–699j (2018).
Katz, D. H. et al. Whole genome sequence analysis of the plasma proteome in black adults provides novel insights into cardiovascular disease. Circulation 145, 357–370 (2021).
Fabregat, A. et al. The Reactome Pathway Knowledgebase. Nucleic Acids Res. 46, D649–D655 (2018).
Patrick, et al. Mapping ICD-10 and ICD-10-CM codes to phecodes: workflow development and initial evaluation. JMIR Med. Inf. 7, e14325 (2019).
Sarwar, N. et al. Interleukin-6 receptor pathways in coronary heart disease: a collaborative meta-analysis of 82 studies. Lancet 379, 1205–1213 (2012).
Haiman, C. A. et al. Levels of β-microseminoprotein in blood and risk of prostate cancer in multiple populations. J. Natl Cancer Inst. 105, 237–243 (2013).
Ding, E. L. et al. Sex hormone-binding globulin and risk of type 2 diabetes in women and men. N. Engl. J. Med. 361, 1152–1163 (2009).
Saini, V. Molecular mechanisms of insulin resistance in type 2 diabetes mellitus. World J. Diabetes 1, 68 (2010).
Qi, L. et al. Genetic variants in ABO blood group region, plasma soluble E-selectin levels and risk of type 2 diabetes. Hum. Mol. Genet. 19, 1856–1862 (2010).
Peters, M. C. et al. Plasma interleukin-6 concentrations, metabolic dysfunction, and asthma severity: a cross-sectional analysis of two cohorts. Lancet Respir. Med. 4, 574–584 (2016).
Banaganapalli, B. et al. Exploring celiac disease candidate pathways by global gene expression profiling and gene network cluster analysis. Sci. Rep. 10, 16290 (2020).
Gagliano Taliun, S. A. et al. Exploring and visualizing large-scale genetic associations by using PheWeb. Nat. Genet. 52, 550–552 (2020).
Kim, H. I. et al. Fine mapping and functional analysis reveal a role of SLC22A1 in acylcarnitine transport. Am. J. Hum. Genet. 101, 489 (2017).
Tamai, I. Pharmacological and pathophysiological roles of carnitine/organic cation transporters (OCTNs: SLC22A4, SLC22A5 and Slc22a21). Biopharm. Drug Dispos. 34, 29–44 (2013).
Chang, H. B., Gao, X., Nepomuceno, R., Hu, S. & Sun, D. Na+/H+ exchanger in the regulation of platelet activation and paradoxical effects of cariporide. Exp. Neurol. 272, 11–16 (2015).
de Vries, P. S. et al. Whole-genome sequencing study of serum peptide levels: the Atherosclerosis Risk in Communities study. Hum. Mol. Genet. 26, 3442–3450 (2017).
Babaev, V. R. et al. Loss of 2 Akt (protein kinase B) isoforms in hematopoietic cells diminished monocyte and macrophage survival and reduces atherosclerosis in Ldl receptor-null mice. Arterioscler. Thromb. Vasc. Biol. 39, 156–169 (2019).
Miteva, K. et al. Cardiotrophin-1 deficiency abrogates atherosclerosis progression. Sci. Rep. 10, 5791 (2020).
Agrawal, S. et al. Signal transducer and activator of transcription 1 is required for optimal foam cell formation and atherosclerotic lesion development. Circulation 115, 2939–2947 (2007).
Peltola, K. J. et al. Pim-1 kinase inhibits STAT5-dependent transcription via its interactions with SOCS1 and SOCS3. Blood 103, 3744–3750 (2004).
Khor, C. C. et al. CISH and susceptibility to infectious diseases. N. Engl. J. Med. 362, 2092–2101 (2010).
Baldini, C., Moriconi, F. R., Galimberti, S., Libby, P. & De Caterina, R. The JAK–STAT pathway: an emerging target for cardiovascular disease in rheumatoid arthritis and myeloproliferative neoplasms. Eur. Heart J. 42, 4389–4400 (2021).
Skah, S., Uchuya-Castillo, J., Sirakov, M. & Plateroti, M. The thyroid hormone nuclear receptors and the Wnt/β-catenin pathway: an intriguing liaison. Dev. Biol. 422, 71–82 (2017).
Chen, G. et al. Regulation of GSK-3β in the proliferation and apoptosis of human thyrocytes investigated using a GSK-3β-targeting RNAi adenovirus expression vector: involvement the Wnt/β-catenin pathway. Mol. Biol. Rep. 37, 2773–2779 (2009).
Ely, K. A., Bischoff, L. A. & Weiss, V. L. Wnt signaling in thyroid homeostasis and carcinogenesis. Genes 9, 204 (2018).
Haerlingen, B. et al. Small-molecule screening in zebrafish embryos identifies signaling pathways regulating early thyroid development. Thyroid 29, 1683–1703 (2019).
Narumi, S. et al. GWAS of thyroid dysgenesis identifies a risk locus at 2q33.3 linked to regulation of Wnt signaling. Hum. Mol. Genet. 31, 3967–3974 (2022).
Xu, D. et al. USP25 regulates Wnt signaling by controlling the stability of tankyrases. Genes Dev. 31, 1024–1035 (2017).
Lin, D. et al. Induction of USP25 by viral infection promotes innate antiviral responses by mediating the stabilization of TRAF3 and TRAF6. Proc. Natl Acad. Sci. USA 112, 11324–11329 (2015).
Nelson, J. K. et al. USP25 promotes pathological HIF-1-driven metabolic reprogramming and is a potential therapeutic target in pancreatic cancer. Nat. Commun. 13, 2070 (2022).
Blount, J. R., Burr, A. A., Denuc, A., Marfany, G. & Todi, S. V. Ubiquitin-specific protease 25 functions in endoplasmic reticulum-associated degradation. PLoS One 7, e36542 (2012).
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Astle, W. J. et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell 167, 1415–1429 (2016).
Sun, B. B. et al. Genomic atlas of the human plasma proteome. Nature 558, 73–79 (2018).
Lundberg, M., Eriksson, A., Tran, B., Assarsson, E. & Fredriksson, S. Homogeneous antibody-based proximity extension assays provide sensitive and specific detection of low-abundant proteins in human blood. Nucleic Acids Res. 39, e102 (2011).
Folkersen, L. et al. Genomic and drug target evaluation of 90 cardiovascular proteins in 30,931 individuals. Nat. Metab. 2, 1135–1148 (2020).
Surendran, P. et al. Rare and common genetic determinants of metabolic individuality and their effects on human health. Nat. Med. 28, 2321–2332 (2022).
Karjalainen, M. K. et al. Genome-wide characterization of circulating metabolic biomarkers reveals substantial pleiotropy and novel disease pathways. Preprint at medRxiv https://doi.org/10.1101/2022.10.20.22281089 (2022).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Fort, A. et al. MBV: a method to solve sample mislabeling and detect technical bias in large combined genotype and sequencing assay datasets. Bioinformatics 33, 1895–1897 (2017).
Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).
Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).
Taylor-Weiner, A. et al. Scaling computational genomics to millions of individuals with GPUs. Genome Biol. 20, 228 (2019).
Stacklies, W., Redestig, H., Scholz, M., Walther, D. & Selbig, J. pcaMethods—a bioconductor package providing PCA methods for incomplete data. Bioinformatics 23, 1164–1167 (2007).
Pietzner, M. et al. Genetic architecture of host proteins involved in SARS-CoV-2 infection. Nat. Commun. 11, 6397 (2020).
Bretherick, A. D. et al. Linking protein to phenotype with Mendelian randomization detects 38 proteins with causal roles in human diseases and traits. PLoS Genet. 16, e1008785 (2020).
Kierczak, M. et al. Contribution of rare whole-genome sequencing variants to plasma protein levels and the missing heritability. Nat. Commun. 13, 2532 (2022).
Ritchie, S. C. et al. Quality control and removal of technical variation of NMR metabolic biomarker data in ~120,000 UK Biobank participants. Sci. Data 10, 64 (2023).
Wong, E. et al. The Singapore National Precision Medicine strategy. Nat. Genet. 55, 178–186 (2023).
Zhang, F. et al. Ancestry-agnostic estimation of DNA sample contamination from sequence reads. Genome Res. 30, 185–194 (2020).
Taylor, H. A. J. et al. Toward resolution of cardiovascular health disparities in African Americans: design and methods of the Jackson Heart Study. Ethn. Dis. 15, S6-4-17 (2005).
Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021).
Ngo, D. et al. Aptamer-based proteomic profiling reveals novel candidate biomarkers and pathways in cardiovascular disease. Circulation 134, 270–285 (2016).
Torkamani, A., Wineinger, N. E. & Topol, E. J. The personal and clinical utility of polygenic risk scores. Nat. Rev. Genet. 19, 581–590 (2018).
Chatterjee, N., Shi, J. & Garcia-Closas, M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet. 17, 392–406 (2016).
Okser, S. et al. Regularized machine learning in the genetic prediction of complex traits. PLoS Genet. 10, e1004754 (2014).
Vilhjálmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015).
Bishop, C. M. Pattern Recognition and Machine Learning (Springer, 2006).
Tipping, M. E. Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 1, 211–244 (2001).
Privé, F., Arbel, J. & Vilhjálmsson, B. J. LDpred2: better, faster, stronger. Bioinformatics 36, 5424–5431 (2021).
Pietzner, M. et al. Synergistic insights into human health from aptamer- and antibody-based proteomic profiling. Nat. Commun. 12, 6822 (2021).
Davidson-Pilon, C. lifelines: survival analysis in Python. J. Open Source Softw. 4, 1317 (2019).
Lannelongue, L., Grealey, J. & Inouye, M. Green algorithms: quantifying the carbon footprint of computation. Adv. Sci. 8, 2100707 (2021).
Di Angelantonio, E. et al. Efficiency and safety of varying the frequency of whole blood donation (INTERVAL): a randomised trial of 45 000 donors. Lancet 390, 2360–2371 (2017).
Participants in the INTERVAL randomized controlled trial were recruited with the active collaboration of NHS Blood and Transplant England (https://www.nhsbt.nhs.uk/), which has supported field work and other elements of the trial. DNA extraction and genotyping were co-funded by the National Institute for Health and Care Research (NIHR), the NIHR BioResource (http://bioresource.nihr.ac.uk) and the NIHR Cambridge Biomedical Research Centre (BRC-1215-20014). The academic coordinating centre for INTERVAL was supported by core funding from the: NIHR Blood and Transplant Research Unit (BTRU) in Donor Health and Genomics (NIHR BTRU-2014-10024), NIHR BTRU in Donor Health and Behaviour (NIHR203337), UK Medical Research Council (MR/L003120/1), British Heart Foundation (SP/09/002; RG/13/13/30194; RG/18/13/33946) and NIHR Cambridge BRC (BRC-1215-20014; NIHR203312). A complete list of the investigators and contributors to the INTERVAL trial is provided in a previous study85. The academic coordinating centre would like to thank blood donor centre staff and blood donors for participating in the INTERVAL trial. RNA-seq was funded as part of an alliance between the University of Cambridge and the AstraZeneca Centre for Genomics Research (AZ ref: 10033507) and by the NIHR Cambridge Biomedical Research Centre (BRC-1215-20014). INTERVAL SomaLogic assays were funded by Merck and the NIHR Cambridge Biomedical Research Centre (BRC-1215-20014). INTERVAL Olink Proteomics assays (Neurology panel) were funded by Biogen. INTERVAL Metabolon assays were funded by the NIHR BioResource, Wellcome Trust grant number 206194, BioMarin Pharmaceutical and the NIHR Cambridge Biomedical Research Centre (BRC-1215-20014). INTERVAL Nightingale Health NMR assays were funded by the European Commission Framework Programme 7 (HEALTH-F2-2012-279233). UKB data access was approved under projects 7439, 11193 and 19655, and all the participants gave their informed consent for health research. The MEC is funded by individual research and clinical scientist award schemes from the Singapore National Medical Research Council (NMRC, including MOH-000271-00) and the Singapore Biomedical Research Council (BMRC), the Singapore Ministry of Health (MOH), the National University of Singapore (NUS) and the Singapore National University Health System (NUHS). This work on omics polygenic score transferability is supported by the NUS–Cambridge Seed Grant July 20201 (NUSMEDIR/Cambridge/2021-07/001). The metabolite biomarkers data were generated in collaboration with Nightingale Health. The protein biomarker data were generated in collaboration with Somalogic. The MEC whole-genome sequencing data made use of data generated as part of the Singapore National Precision Medicine (NPM) program funded by the Industry Alignment Fund (Pre-Positioning) (IAF-PP: H17/01/a0/007). NPM made use of data and samples collected in the following cohorts in Singapore: (1) the Health for Life in Singapore (HELIOS) study at the Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore (supported by grants from a Strategic Initiative at Lee Kong Chian School of Medicine, the Singapore Ministry of Health (MOH) under its Singapore Translational Research Investigator Award (NMRC/STaR/0028/2017) and the IAF-PP: H18/01/a0/016); (2) the Growing up in Singapore Towards Healthy Outcomes (GUSTO) study, which is jointly hosted by the National University Hospital (NUH), KK Women’s and Children’s Hospital (KKH), the National University of Singapore (NUS) and the Singapore Institute for Clinical Sciences (SICS), Agency for Science Technology and Research (A*STAR) (supported by the Singapore National Research Foundation under its Translational and Clinical Research (TCR) Flagship Programme and administered by the Singapore Ministry of Health’s National Medical Research Council (NMRC), Singapore-NMRC/TCR/004-NUS/2008; NMRC/TCR/012-NUHS/2014. Additional funding is provided by SICS and IAF-PP H17/01/a0/005); (3) the Singapore Epidemiology of Eye Diseases (SEED) cohort at Singapore Eye Research Institute (SERI) (supported by NMRC/CIRG/1417/2015; NMRC/CIRG/1488/2018 and NMRC/OFLCG/004/2018); (4) the MEC cohort (supported by NMRC grant 0838/2004, BMRC grants 03/1/27/18/216, 05/1/21/19/425 and 11/1/21/19/678, Ministry of Health, Singapore, National University of Singapore and National University Health System, Singapore); (5) the SingHealth Duke–NUS Institute of Precision Medicine (PRISM) cohort (supported by NMRC/CG/M006/2017_NHCS, NMRC/StaR/0011/2012, NMRC/StaR/ 0026/2015, Lee Foundation and Tanoto Foundation); (6) the TTSH Personalised Medicine Normal Controls (TTSH) cohort (supported by NMRC/CG12AUG17 and CGAug16M012). The views expressed are those of the author(s) and not necessarily those of the National Precision Medicine investigators, or institutional partners. We are grateful to all Fenland volunteers and to the general practitioners and practice staff for assistance with recruitment. We thank the Fenland Study Investigators, Fenland Study Co-ordination team and the Epidemiology Field, Data and Laboratory teams. Proteomic measurements were supported and governed by a collaboration agreement between the University of Cambridge and SomaLogic. The Fenland Study (10.22025/2017.10.101.00001) is funded by the Medical Research Council (MC_UU_12015/1). We further acknowledge support for genomics from the Medical Research Council (MC_PC_13046). ORCADES was supported by the Chief Scientist Office of the Scottish Government (CZB/4/276 and CZB/4/710), a Royal Society URF to J.F.W., the MRC Human Genetics Unit quinquennial programme ‘QTL in Health and Disease’, Arthritis Research UK and the European Union framework program 6 EUROSPAN project (contract no. LSHG-CT-2006-018947). DNA extractions were performed at the Edinburgh Clinical Research Facility, University of Edinburgh. We would like to acknowledge the contributions of the research nurses in Orkney, the administrative team in Edinburgh and the people of Orkney. The Viking Health Study Shetland (VIKING) was supported by the MRC Human Genetics Unit quinquennial programme grant ‘QTL in Health and Disease’. DNA extractions and genotyping were performed at the Edinburgh Clinical Research Facility, University of Edinburgh. We would like to acknowledge the contributions of the research nurses in Shetland, the administrative team in Edinburgh and the people of Shetland. We acknowledge support from the MRC Human Genetics Unit programme grant, ‘Quantitative traits in health and disease’ (U. MC_UU_00007/10). Whole-genome sequencing for the Trans-Omics in Precision Medicine (TOPMed) program was supported by the National Heart, Lung and Blood Institute (NHLBI). Whole-genome sequencing for ‘NHLBI TOPMed: Jackson Heart Study’ (phs000964) was performed at the Northwest Genomics Center (HHSN268201100037C). Core support including centralized genomic read mapping and genotype calling, along with variant quality metrics and filtering were provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1; contract HHSN268201800002I). Core support including phenotype harmonization, data management, sample-identity QC and general program coordination were provided by the TOPMed Data Coordinating Center (R01HL-120393; U01HL-120393; contract HHSN268201800001I). We acknowledge the studies and participants who provided biological samples and data for TOPMed. The Jackson Heart Study (JHS) is supported and conducted in collaboration with Jackson State University (HHSN268201800013I), Tougaloo College (HHSN268201800014I), the Mississippi State Department of Health (HHSN268201800015I) and the University of Mississippi Medical Center (HHSN268201800010I, HHSN268201800011I and HHSN268201800012I) contracts from the NHLBI and the National Institute on Minority Health and Health Disparities (NIMHD). We also thank the staff and participants of the JHS. JHS disclaimer: the views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute; the National Institutes of Health; or the US Department of Health and Human Services. This work was also funded by the Swedish Research Council (2019-01497) and the Swedish Heart-Lung foundation (20200687). Y.X. and M.I. were supported by the UK Economic and Social Research Council (ES/T013192/1). S.C.R., L.L., C.F. and E.P. are funded by a BHF Programme Grant (RG/18/13/33946). C.L., M.P. and J.L. were funded by the Medical Research Council (MC_UU_00006/1 – Aetiology and Mechanisms). S.A.L. was supported by a Canadian Institutes of Health Research postdoctoral fellowship (MFE-171279). U.A.T. is supported by a US National Institutes of Health Mentored Clinical Scientist Development Award program (1K08HL161445-01A1). C.F. was supported by the Health Data Research UK. E.P. was funded by the EU/EFPIA Innovative Medicines Initiative Joint Undertaking BigData@Heart grant 116074, NIHR BTRU in Donor Health and Genomics (NIHR BTRU-2014-10024) and NIHR BTRU in Donor Health and Behaviour (NIHR203337). J.E.P. was supported by a Medical Research Foundation grant (MRF-042-0001-RG-PETE-C0839). E.E.D. is supported by a Wellcome Trust grant (206194, 220540/Z/20/A). R.E.G. is supported by a US National Institutes of Health grant for proteomics in the Jackson Heart Study (R01 HL133870) and an NIH contract to perform proteomics and metabolomics in multiple cohorts (HHSN268201600034I). J.D. holds a British Heart Foundation Professorship and a NIHR Senior Investigator Award. M.I. is supported by the Munz Chair of Cardiovascular Prediction and Prevention and the NIHR Cambridge Biomedical Research Centre (NIHR203312). This study was supported by the Victorian Government’s Operational Infrastructure Support (OIS) program. This work was supported by Health Data Research UK, which is funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation and Wellcome. This research was supported by an HDRUK Director’s Innovation Award (HDRUK2022.0130). We acknowledge B. Sun and T. Jiang for previous analyses of INTERVAL SomaScan and genotype QC, respectively. For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) licence to any author accepted manuscript version arising from this submission. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, NHSBT or the Department of Health and Social Care.
During the drafting of the manuscript, P.R.H.J.T. became a part-time employee of BioAge Labs, P.S. became a full-time employee of GSK and D.S.P. became a full-time employee of AstraZeneca. L.B. is an employee of BioMarin. J.D. serves on scientific advisory boards for AstraZeneca, Novartis and UK Biobank, and has received multiple grants from academic, charitable and industry sources outside of the submitted work. A.M. is an employee of Pfizer. A.S.B. reports institutional grants from AstraZeneca, Bayer, Biogen, BioMarin, Bioverativ, Novartis, Regeneron and Sanofi.
Peer review information
Nature thanks Heiko Runz, Bjarni Vilhjálmsson and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Schematic framework for the development and validation of multi-omic genetic scores.
This figure presents the overall study design for the development of genetic scores for multi-omic traits across five platforms (Nightingale, Metabolon, Olink, SomaScan and RNA-seq) using INTERVAL data as well as their validation in seven external cohorts of multiple ancestries (European, Asian-Chinese, Asian-Malay, Asian-Indian and African American).
Extended Data Fig. 2 R2 performance comparison between Bayesian ridge, LDpred2 and P+T for Metabolon traits in external validation (INTERVAL withheld set).
This figure compares the R2 performance between BR (on the set of genome-wide variants with p-value < 5 × 10−8; x-axis) and LDpred2 (Hapmap3 variant set), and between BR and P+T (variant sets of two p-value thresholds: 5 × 10−8 and 1×10−3) for 20 randomly selected Metabolon traits in external validation (INTERVAL withheld set; Methods). P-values in the GWAS for omic traits were derived by t-test in linear regression and all tests were two-sided.
Extended Data Fig. 3 Distribution of the number of variants in the genetic scores and the correlations between performance (R2) of genetic scores and the number of variants comprising the score.
The density plots show the distribution of the number of variants comprising the genetic scores at each platform. The scatter plots show the change of R2 score in the internal validation by the number of variants in the genetic-score model.
The scatter plots compare the spearman correlation scores between internal validation and external validation with a European cohort on each platform, in which points are coloured by the variant missingness rate in the external cohort and the blue line shows the linear models fitting the data points. This analysis included all the developed genetic scores in this study.
Extended Data Fig. 5 Validation of the performance change of genetic scores by their variant missing rates in external cohorts of different ancestries.
External validation results in European cohorts were merged in each platform to increase the statistical power in this analysis, which include NSPHS and ORCADES validations for Olink, and ORCADES and VIKINGS validations for Nightingale. Note that INTERVAL withheld subset validations and UKB validation for Nightingale traits were excluded in this analysis due to there is no or nearly no variant missingness in the external cohort (or INTERVAL withheld subset). Validation results in each platform were ranked by their variant missing rate of genetic-score models in the external cohort and grouped into tertiles, where variant missing rate is the number of variants missing in the validation cohort / the total number of variants in the genetic score. This figure presents the mean and standard error (SE) of R2 performance change of genetic scores between internal and external validation across tertiles of validation results. The analysis included validation results of 2,129 SomaScan, 603 Olink, 455 Metabolon and 423 Nightingale traits (traits can be overlapped for the same platform across multiple validation cohorts) for European (EUR); 2,047 SomaScan and 139 Nightingale traits for Chinese (CN), Indian (IN) and Malay (MA); 820 SomaScan traits for African American (AF).
Extended Data Fig. 6 Performance (R2) of genetic scores for Nightingale and SomaScan in external cohorts of various ancestries relative to R2 in internal validation (INTERVAL).
a, Nightingale; b, SomaScan. Transferability was only tested if the genetic score had a significant (two-sided t-test; Bonferroni corrected p-value < 0.05 for all the 17,227 omic traits tested) association with the directly measured molecular trait in internal validation (n = 1631, 7471, 964, 635 and 827 for Metabolon, Nightingale, Olink, SomaScan and RNA-seq traits, respectively). This resulted in 137, 136 Nightingale metabolic traits for UKB (n = 98,245 participants) and MEC (Chinese, n = 1,067; Indian, n = 654; Malay, n = 634) respectively and 949, 1052, 378 SomaScan proteins for FENLAND (n = 8,832), MEC (Chinese, n = 645; Indian, n = 564; Malay, n = 563) and JHS (n = 1,852). Violin plots show distributions of the ratio of R2 values. Black points show mean values and error bars are standard errors.
Extended Data Fig. 7 Performance (R2) of genetic scores between longitudinal samples and across ancestries in the MEC cohort.
Paired samples include a baseline and a revisit sample from each individual run on SomaScan and Nightingale for MEC Chinese (N = 403 and 721 individuals), MEC Indian (N = 356 and 376) and MEC Malay (N = 353 and 363). Blue lines denote linear models fitted to each set of data points and the shaded areas represent 95% confidence intervals where applicable. There is no Nightingale genetic scores with a R2 > 0.15 in both internal and MEC validation, so a–c only show R2 in the range of [0, 0.15] for clarity. The sub-box plots at the right bottom of d–f show the validation results of these traits with baseline validation performance (R2) between 0 and 0.025 in each ancestry.
This analysis looked at all the lowest-level pathways of super-pathways curated at Reactome. Where at least one protein genetic score are included in the entities of a lowest-level pathway, we consider this pathway is covered by proteins of this study. This figure shows the percentage of the lowest-level pathways a group of proteins (by R2 in internal validation) covered among all the lowest-level pathways of each super-pathway.
Extended Data Fig. 9 Key features of the OmicsPred portal for accessing genetic scores of multi-omic traits.
a, Organization of genetic scores on the portal. b, Example of how biomolecular traits and their genetic-score-related information can be explored. c, Example of how summary statistics of training and validation cohorts are presented. d, Example of how validation results and genetic-score models can be downloaded. e, Example of how validation results and trait-related information can be visualized.
About this article
Cite this article
Xu, Y., Ritchie, S.C., Liang, Y. et al. An atlas of genetic scores to predict multi-omic traits. Nature 616, 123–131 (2023). https://doi.org/10.1038/s41586-023-05844-9