Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

An atlas of genetic scores to predict multi-omic traits

Abstract

The use of omic modalities to dissect the molecular underpinnings of common diseases and traits is becoming increasingly common. But multi-omic traits can be genetically predicted, which enables highly cost-effective and powerful analyses for studies that do not have multi-omics1. Here we examine a large cohort (the INTERVAL study2; n = 50,000 participants) with extensive multi-omic data for plasma proteomics (SomaScan, n = 3,175; Olink, n = 4,822), plasma metabolomics (Metabolon HD4, n = 8,153), serum metabolomics (Nightingale, n = 37,359) and whole-blood Illumina RNA sequencing (n = 4,136), and use machine learning to train genetic scores for 17,227 molecular traits, including 10,521 that reach Bonferroni-adjusted significance. We evaluate the performance of genetic scores through external validation across cohorts of individuals of European, Asian and African American ancestries. In addition, we show the utility of these multi-omic genetic scores by quantifying the genetic control of biological pathways and by generating a synthetic multi-omic dataset of the UK Biobank3 to identify disease associations using a phenome-wide scan. We highlight a series of biological insights with regard to genetic mechanisms in metabolism and canonical pathway associations with disease; for example, JAK–STAT signalling and coronary atherosclerosis. Finally, we develop a portal (https://www.omicspred.org/) to facilitate public access to all genetic scores and validation results, as well as to serve as a platform for future extensions and enhancements of multi-omic genetic scores.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Performance of multi-omic genetic scores in internal validation.
Fig. 2: External validation of genetic scores in cohorts of European ancestry.
Fig. 3: Transferability of genetic scores to cohorts of Asian and African American ancestries.
Fig. 4: Applications of genetic scores of multi-omic traits.
Fig. 5: JAK–STAT and WNT signalling pathways.

Similar content being viewed by others

Data availability

All of the genetic-score models trained in this study and GWAS summary statistics used to develop genetic scores are publicly accessible through the OmicsPred portal (https://www.omicspred.org/) under accession codes OPGS000001–OPGS017227. INTERVAL study data from this paper are available to bona fide researchers from helpdesk@intervalstudy.org.uk and information, including the data access policy, is available at http://www.donorhealth-btru.nihr.ac.uk/project/bioresource.

Code availability

The original codes used to train the genetic scores with INTERVAL data, internally validate these scores and benchmark the performance of different genetic-score construction methods are available at https://github.com/xuyu-cam/atlas_genetic_scores_omic_traits.

References

  1. Barbeira, A. N. et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 9, 1825 (2018).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  2. Moore, C. et al. The INTERVAL trial to determine whether intervals between blood donations can be safely and acceptably decreased to optimise blood supply: study protocol for a randomised controlled trial. Trials 15, 363 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  4. Ritchie, S. C. et al. Integrative analysis of the plasma proteome and polygenic risk of cardiometabolic diseases. Nat. Metab. 3, 1476–1483 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Lambert, S. A. et al. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat. Genet. 53, 420–425 (2021).

    Article  CAS  PubMed  Google Scholar 

  6. Adeyemo, A. et al. Responsible use of polygenic risk scores in the clinic: potential benefits, risks and gaps. Nat. Med. 27, 1876–1884 (2021).

    Article  Google Scholar 

  7. Xu, Y. et al. Machine learning optimized polygenic scores for blood cell traits identify sex-specific trajectories and genetic correlations with disease. Cell Genomics 2, 100086 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Mosley, J. D. et al. Probing the virtual proteome to identify novel disease biomarkers. Circulation 138, 2469–2481 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Hutcheon, J. A., Chiolero, A. & Hanley, J. A. Random measurement error and regression dilution bias. Br. Med. J. 340, 1402–1406 (2010).

    Article  Google Scholar 

  12. Pividori, M., Schoettler, N., Nicolae, D. L., Ober, C. & Im, H. K. Shared and distinct genetic risk factors for childhood-onset and adult-onset asthma: genome-wide and transcriptome-wide studies. Lancet Respir. Med. 7, 509–522 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Lannelongue, L., Grealey, J., Bateman, A. & Inouye, M. Ten simple rules to make your computing more environmentally sustainable. PLoS Comput. Biol. 17, e1009324 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  14. Berisa, T. & Pickrell, J. K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283–285 (2016).

    Article  CAS  PubMed  Google Scholar 

  15. Pietzner, M. et al. Mapping the proteo-genomic convergence of human diseases. Science 374, eabj1541 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Igl, W., Johansson, A. & Gyllensten, U. The Northern Swedish Population Health Study (NSPHS)—a paradigmatic study in a rural population combining community health and basic research. Rural Remote Health 10, 1363 (2010).

    PubMed  Google Scholar 

  17. McQuillan, R. et al. Runs of homozygosity in European populations. Am. J. Hum. Genet. 83, 359 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Kerr, S. M. et al. An actionable KCNH2 Long QT Syndrome variant detected by sequence and haplotype analysis in a population research cohort. Sci. Rep. 9, 10964 (2019).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  19. Tan, K. H. X. et al. Cohort profile: the Singapore Multi-Ethnic Cohort (MEC) study. Int. J. Epidemiol. 47, 699–699j (2018).

    Article  PubMed  Google Scholar 

  20. Katz, D. H. et al. Whole genome sequence analysis of the plasma proteome in black adults provides novel insights into cardiovascular disease. Circulation 145, 357–370 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Fabregat, A. et al. The Reactome Pathway Knowledgebase. Nucleic Acids Res. 46, D649–D655 (2018).

    Article  CAS  PubMed  Google Scholar 

  22. Patrick, et al. Mapping ICD-10 and ICD-10-CM codes to phecodes: workflow development and initial evaluation. JMIR Med. Inf. 7, e14325 (2019).

    Article  Google Scholar 

  23. Sarwar, N. et al. Interleukin-6 receptor pathways in coronary heart disease: a collaborative meta-analysis of 82 studies. Lancet 379, 1205–1213 (2012).

    Article  PubMed  Google Scholar 

  24. Haiman, C. A. et al. Levels of β-microseminoprotein in blood and risk of prostate cancer in multiple populations. J. Natl Cancer Inst. 105, 237–243 (2013).

    Article  CAS  PubMed  Google Scholar 

  25. Ding, E. L. et al. Sex hormone-binding globulin and risk of type 2 diabetes in women and men. N. Engl. J. Med. 361, 1152–1163 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Saini, V. Molecular mechanisms of insulin resistance in type 2 diabetes mellitus. World J. Diabetes 1, 68 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Qi, L. et al. Genetic variants in ABO blood group region, plasma soluble E-selectin levels and risk of type 2 diabetes. Hum. Mol. Genet. 19, 1856–1862 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Peters, M. C. et al. Plasma interleukin-6 concentrations, metabolic dysfunction, and asthma severity: a cross-sectional analysis of two cohorts. Lancet Respir. Med. 4, 574–584 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Banaganapalli, B. et al. Exploring celiac disease candidate pathways by global gene expression profiling and gene network cluster analysis. Sci. Rep. 10, 16290 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Gagliano Taliun, S. A. et al. Exploring and visualizing large-scale genetic associations by using PheWeb. Nat. Genet. 52, 550–552 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Kim, H. I. et al. Fine mapping and functional analysis reveal a role of SLC22A1 in acylcarnitine transport. Am. J. Hum. Genet. 101, 489 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Tamai, I. Pharmacological and pathophysiological roles of carnitine/organic cation transporters (OCTNs: SLC22A4, SLC22A5 and Slc22a21). Biopharm. Drug Dispos. 34, 29–44 (2013).

    Article  CAS  PubMed  Google Scholar 

  33. Chang, H. B., Gao, X., Nepomuceno, R., Hu, S. & Sun, D. Na+/H+ exchanger in the regulation of platelet activation and paradoxical effects of cariporide. Exp. Neurol. 272, 11–16 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. de Vries, P. S. et al. Whole-genome sequencing study of serum peptide levels: the Atherosclerosis Risk in Communities study. Hum. Mol. Genet. 26, 3442–3450 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Babaev, V. R. et al. Loss of 2 Akt (protein kinase B) isoforms in hematopoietic cells diminished monocyte and macrophage survival and reduces atherosclerosis in Ldl receptor-null mice. Arterioscler. Thromb. Vasc. Biol. 39, 156–169 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Miteva, K. et al. Cardiotrophin-1 deficiency abrogates atherosclerosis progression. Sci. Rep. 10, 5791 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  37. Agrawal, S. et al. Signal transducer and activator of transcription 1 is required for optimal foam cell formation and atherosclerotic lesion development. Circulation 115, 2939–2947 (2007).

    Article  CAS  PubMed  Google Scholar 

  38. Peltola, K. J. et al. Pim-1 kinase inhibits STAT5-dependent transcription via its interactions with SOCS1 and SOCS3. Blood 103, 3744–3750 (2004).

    Article  CAS  PubMed  Google Scholar 

  39. Khor, C. C. et al. CISH and susceptibility to infectious diseases. N. Engl. J. Med. 362, 2092–2101 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Baldini, C., Moriconi, F. R., Galimberti, S., Libby, P. & De Caterina, R. The JAK–STAT pathway: an emerging target for cardiovascular disease in rheumatoid arthritis and myeloproliferative neoplasms. Eur. Heart J. 42, 4389–4400 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Skah, S., Uchuya-Castillo, J., Sirakov, M. & Plateroti, M. The thyroid hormone nuclear receptors and the Wnt/β-catenin pathway: an intriguing liaison. Dev. Biol. 422, 71–82 (2017).

    Article  CAS  PubMed  Google Scholar 

  42. Chen, G. et al. Regulation of GSK-3β in the proliferation and apoptosis of human thyrocytes investigated using a GSK-3β-targeting RNAi adenovirus expression vector: involvement the Wnt/β-catenin pathway. Mol. Biol. Rep. 37, 2773–2779 (2009).

    Article  PubMed  Google Scholar 

  43. Ely, K. A., Bischoff, L. A. & Weiss, V. L. Wnt signaling in thyroid homeostasis and carcinogenesis. Genes 9, 204 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Haerlingen, B. et al. Small-molecule screening in zebrafish embryos identifies signaling pathways regulating early thyroid development. Thyroid 29, 1683–1703 (2019).

    Article  CAS  PubMed  Google Scholar 

  45. Narumi, S. et al. GWAS of thyroid dysgenesis identifies a risk locus at 2q33.3 linked to regulation of Wnt signaling. Hum. Mol. Genet. 31, 3967–3974 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Xu, D. et al. USP25 regulates Wnt signaling by controlling the stability of tankyrases. Genes Dev. 31, 1024–1035 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Lin, D. et al. Induction of USP25 by viral infection promotes innate antiviral responses by mediating the stabilization of TRAF3 and TRAF6. Proc. Natl Acad. Sci. USA 112, 11324–11329 (2015).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  48. Nelson, J. K. et al. USP25 promotes pathological HIF-1-driven metabolic reprogramming and is a potential therapeutic target in pancreatic cancer. Nat. Commun. 13, 2070 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  49. Blount, J. R., Burr, A. A., Denuc, A., Marfany, G. & Todi, S. V. Ubiquitin-specific protease 25 functions in endoplasmic reticulum-associated degradation. PLoS One 7, e36542 (2012).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  50. Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  52. The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  Google Scholar 

  53. Astle, W. J. et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell 167, 1415–1429 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Sun, B. B. et al. Genomic atlas of the human plasma proteome. Nature 558, 73–79 (2018).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  55. Lundberg, M., Eriksson, A., Tran, B., Assarsson, E. & Fredriksson, S. Homogeneous antibody-based proximity extension assays provide sensitive and specific detection of low-abundant proteins in human blood. Nucleic Acids Res. 39, e102 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Folkersen, L. et al. Genomic and drug target evaluation of 90 cardiovascular proteins in 30,931 individuals. Nat. Metab. 2, 1135–1148 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Surendran, P. et al. Rare and common genetic determinants of metabolic individuality and their effects on human health. Nat. Med. 28, 2321–2332 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Karjalainen, M. K. et al. Genome-wide characterization of circulating metabolic biomarkers reveals substantial pleiotropy and novel disease pathways. Preprint at medRxiv https://doi.org/10.1101/2022.10.20.22281089 (2022).

  59. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    Article  CAS  PubMed  Google Scholar 

  60. Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).

    Article  CAS  PubMed  Google Scholar 

  61. Fort, A. et al. MBV: a method to solve sample mislabeling and detect technical bias in large combined genotype and sequencing assay datasets. Bioinformatics 33, 1895–1897 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  63. Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Taylor-Weiner, A. et al. Scaling computational genomics to millions of individuals with GPUs. Genome Biol. 20, 228 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  65. Stacklies, W., Redestig, H., Scholz, M., Walther, D. & Selbig, J. pcaMethods—a bioconductor package providing PCA methods for incomplete data. Bioinformatics 23, 1164–1167 (2007).

    Article  CAS  PubMed  Google Scholar 

  66. Pietzner, M. et al. Genetic architecture of host proteins involved in SARS-CoV-2 infection. Nat. Commun. 11, 6397 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  67. Bretherick, A. D. et al. Linking protein to phenotype with Mendelian randomization detects 38 proteins with causal roles in human diseases and traits. PLoS Genet. 16, e1008785 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Kierczak, M. et al. Contribution of rare whole-genome sequencing variants to plasma protein levels and the missing heritability. Nat. Commun. 13, 2532 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  69. Ritchie, S. C. et al. Quality control and removal of technical variation of NMR metabolic biomarker data in ~120,000 UK Biobank participants. Sci. Data 10, 64 (2023).

  70. Wong, E. et al. The Singapore National Precision Medicine strategy. Nat. Genet. 55, 178–186 (2023).

    Article  CAS  PubMed  Google Scholar 

  71. Zhang, F. et al. Ancestry-agnostic estimation of DNA sample contamination from sequence reads. Genome Res. 30, 185–194 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Taylor, H. A. J. et al. Toward resolution of cardiovascular health disparities in African Americans: design and methods of the Jackson Heart Study. Ethn. Dis. 15, S6-4-17 (2005).

    PubMed  Google Scholar 

  73. Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  74. Ngo, D. et al. Aptamer-based proteomic profiling reveals novel candidate biomarkers and pathways in cardiovascular disease. Circulation 134, 270–285 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Torkamani, A., Wineinger, N. E. & Topol, E. J. The personal and clinical utility of polygenic risk scores. Nat. Rev. Genet. 19, 581–590 (2018).

    Article  CAS  PubMed  Google Scholar 

  76. Chatterjee, N., Shi, J. & Garcia-Closas, M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet. 17, 392–406 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Okser, S. et al. Regularized machine learning in the genetic prediction of complex traits. PLoS Genet. 10, e1004754 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  78. Vilhjálmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  79. Bishop, C. M. Pattern Recognition and Machine Learning (Springer, 2006).

  80. Tipping, M. E. Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 1, 211–244 (2001).

    MathSciNet  MATH  Google Scholar 

  81. Privé, F., Arbel, J. & Vilhjálmsson, B. J. LDpred2: better, faster, stronger. Bioinformatics 36, 5424–5431 (2021).

    Article  Google Scholar 

  82. Pietzner, M. et al. Synergistic insights into human health from aptamer- and antibody-based proteomic profiling. Nat. Commun. 12, 6822 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  83. Davidson-Pilon, C. lifelines: survival analysis in Python. J. Open Source Softw. 4, 1317 (2019).

    Article  ADS  Google Scholar 

  84. Lannelongue, L., Grealey, J. & Inouye, M. Green algorithms: quantifying the carbon footprint of computation. Adv. Sci. 8, 2100707 (2021).

    Article  Google Scholar 

  85. Di Angelantonio, E. et al. Efficiency and safety of varying the frequency of whole blood donation (INTERVAL): a randomised trial of 45 000 donors. Lancet 390, 2360–2371 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Participants in the INTERVAL randomized controlled trial were recruited with the active collaboration of NHS Blood and Transplant England (https://www.nhsbt.nhs.uk/), which has supported field work and other elements of the trial. DNA extraction and genotyping were co-funded by the National Institute for Health and Care Research (NIHR), the NIHR BioResource (http://bioresource.nihr.ac.uk) and the NIHR Cambridge Biomedical Research Centre (BRC-1215-20014). The academic coordinating centre for INTERVAL was supported by core funding from the: NIHR Blood and Transplant Research Unit (BTRU) in Donor Health and Genomics (NIHR BTRU-2014-10024), NIHR BTRU in Donor Health and Behaviour (NIHR203337), UK Medical Research Council (MR/L003120/1), British Heart Foundation (SP/09/002; RG/13/13/30194; RG/18/13/33946) and NIHR Cambridge BRC (BRC-1215-20014; NIHR203312). A complete list of the investigators and contributors to the INTERVAL trial is provided in a previous study85. The academic coordinating centre would like to thank blood donor centre staff and blood donors for participating in the INTERVAL trial. RNA-seq was funded as part of an alliance between the University of Cambridge and the AstraZeneca Centre for Genomics Research (AZ ref: 10033507) and by the NIHR Cambridge Biomedical Research Centre (BRC-1215-20014). INTERVAL SomaLogic assays were funded by Merck and the NIHR Cambridge Biomedical Research Centre (BRC-1215-20014). INTERVAL Olink Proteomics assays (Neurology panel) were funded by Biogen. INTERVAL Metabolon assays were funded by the NIHR BioResource, Wellcome Trust grant number 206194, BioMarin Pharmaceutical and the NIHR Cambridge Biomedical Research Centre (BRC-1215-20014). INTERVAL Nightingale Health NMR assays were funded by the European Commission Framework Programme 7 (HEALTH-F2-2012-279233). UKB data access was approved under projects 7439, 11193 and 19655, and all the participants gave their informed consent for health research. The MEC is funded by individual research and clinical scientist award schemes from the Singapore National Medical Research Council (NMRC, including MOH-000271-00) and the Singapore Biomedical Research Council (BMRC), the Singapore Ministry of Health (MOH), the National University of Singapore (NUS) and the Singapore National University Health System (NUHS). This work on omics polygenic score transferability is supported by the NUS–Cambridge Seed Grant July 20201 (NUSMEDIR/Cambridge/2021-07/001). The metabolite biomarkers data were generated in collaboration with Nightingale Health. The protein biomarker data were generated in collaboration with Somalogic. The MEC whole-genome sequencing data made use of data generated as part of the Singapore National Precision Medicine (NPM) program funded by the Industry Alignment Fund (Pre-Positioning) (IAF-PP: H17/01/a0/007). NPM made use of data and samples collected in the following cohorts in Singapore: (1) the Health for Life in Singapore (HELIOS) study at the Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore (supported by grants from a Strategic Initiative at Lee Kong Chian School of Medicine, the Singapore Ministry of Health (MOH) under its Singapore Translational Research Investigator Award (NMRC/STaR/0028/2017) and the IAF-PP: H18/01/a0/016); (2) the Growing up in Singapore Towards Healthy Outcomes (GUSTO) study, which is jointly hosted by the National University Hospital (NUH), KK Women’s and Children’s Hospital (KKH), the National University of Singapore (NUS) and the Singapore Institute for Clinical Sciences (SICS), Agency for Science Technology and Research (A*STAR) (supported by the Singapore National Research Foundation under its Translational and Clinical Research (TCR) Flagship Programme and administered by the Singapore Ministry of Health’s National Medical Research Council (NMRC), Singapore-NMRC/TCR/004-NUS/2008; NMRC/TCR/012-NUHS/2014. Additional funding is provided by SICS and IAF-PP H17/01/a0/005); (3) the Singapore Epidemiology of Eye Diseases (SEED) cohort at Singapore Eye Research Institute (SERI) (supported by NMRC/CIRG/1417/2015; NMRC/CIRG/1488/2018 and NMRC/OFLCG/004/2018); (4) the MEC cohort (supported by NMRC grant 0838/2004, BMRC grants 03/1/27/18/216, 05/1/21/19/425 and 11/1/21/19/678, Ministry of Health, Singapore, National University of Singapore and National University Health System, Singapore); (5) the SingHealth Duke–NUS Institute of Precision Medicine (PRISM) cohort (supported by NMRC/CG/M006/2017_NHCS, NMRC/StaR/0011/2012, NMRC/StaR/ 0026/2015, Lee Foundation and Tanoto Foundation); (6) the TTSH Personalised Medicine Normal Controls (TTSH) cohort (supported by NMRC/CG12AUG17 and CGAug16M012). The views expressed are those of the author(s) and not necessarily those of the National Precision Medicine investigators, or institutional partners. We are grateful to all Fenland volunteers and to the general practitioners and practice staff for assistance with recruitment. We thank the Fenland Study Investigators, Fenland Study Co-ordination team and the Epidemiology Field, Data and Laboratory teams. Proteomic measurements were supported and governed by a collaboration agreement between the University of Cambridge and SomaLogic. The Fenland Study (10.22025/2017.10.101.00001) is funded by the Medical Research Council (MC_UU_12015/1). We further acknowledge support for genomics from the Medical Research Council (MC_PC_13046). ORCADES was supported by the Chief Scientist Office of the Scottish Government (CZB/4/276 and CZB/4/710), a Royal Society URF to J.F.W., the MRC Human Genetics Unit quinquennial programme ‘QTL in Health and Disease’, Arthritis Research UK and the European Union framework program 6 EUROSPAN project (contract no. LSHG-CT-2006-018947). DNA extractions were performed at the Edinburgh Clinical Research Facility, University of Edinburgh. We would like to acknowledge the contributions of the research nurses in Orkney, the administrative team in Edinburgh and the people of Orkney. The Viking Health Study Shetland (VIKING) was supported by the MRC Human Genetics Unit quinquennial programme grant ‘QTL in Health and Disease’. DNA extractions and genotyping were performed at the Edinburgh Clinical Research Facility, University of Edinburgh. We would like to acknowledge the contributions of the research nurses in Shetland, the administrative team in Edinburgh and the people of Shetland. We acknowledge support from the MRC Human Genetics Unit programme grant, ‘Quantitative traits in health and disease’ (U. MC_UU_00007/10). Whole-genome sequencing for the Trans-Omics in Precision Medicine (TOPMed) program was supported by the National Heart, Lung and Blood Institute (NHLBI). Whole-genome sequencing for ‘NHLBI TOPMed: Jackson Heart Study’ (phs000964) was performed at the Northwest Genomics Center (HHSN268201100037C). Core support including centralized genomic read mapping and genotype calling, along with variant quality metrics and filtering were provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1; contract HHSN268201800002I). Core support including phenotype harmonization, data management, sample-identity QC and general program coordination were provided by the TOPMed Data Coordinating Center (R01HL-120393; U01HL-120393; contract HHSN268201800001I). We acknowledge the studies and participants who provided biological samples and data for TOPMed. The Jackson Heart Study (JHS) is supported and conducted in collaboration with Jackson State University (HHSN268201800013I), Tougaloo College (HHSN268201800014I), the Mississippi State Department of Health (HHSN268201800015I) and the University of Mississippi Medical Center (HHSN268201800010I, HHSN268201800011I and HHSN268201800012I) contracts from the NHLBI and the National Institute on Minority Health and Health Disparities (NIMHD). We also thank the staff and participants of the JHS. JHS disclaimer: the views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute; the National Institutes of Health; or the US Department of Health and Human Services. This work was also funded by the Swedish Research Council (2019-01497) and the Swedish Heart-Lung foundation (20200687). Y.X. and M.I. were supported by the UK Economic and Social Research Council (ES/T013192/1). S.C.R., L.L., C.F. and E.P. are funded by a BHF Programme Grant (RG/18/13/33946). C.L., M.P. and J.L. were funded by the Medical Research Council (MC_UU_00006/1 – Aetiology and Mechanisms). S.A.L. was supported by a Canadian Institutes of Health Research postdoctoral fellowship (MFE-171279). U.A.T. is supported by a US National Institutes of Health Mentored Clinical Scientist Development Award program (1K08HL161445-01A1). C.F. was supported by the Health Data Research UK. E.P. was funded by the EU/EFPIA Innovative Medicines Initiative Joint Undertaking BigData@Heart grant 116074, NIHR BTRU in Donor Health and Genomics (NIHR BTRU-2014-10024) and NIHR BTRU in Donor Health and Behaviour (NIHR203337). J.E.P. was supported by a Medical Research Foundation grant (MRF-042-0001-RG-PETE-C0839). E.E.D. is supported by a Wellcome Trust grant (206194, 220540/Z/20/A). R.E.G. is supported by a US National Institutes of Health grant for proteomics in the Jackson Heart Study (R01 HL133870) and an NIH contract to perform proteomics and metabolomics in multiple cohorts (HHSN268201600034I). J.D. holds a British Heart Foundation Professorship and a NIHR Senior Investigator Award. M.I. is supported by the Munz Chair of Cardiovascular Prediction and Prevention and the NIHR Cambridge Biomedical Research Centre (NIHR203312). This study was supported by the Victorian Government’s Operational Infrastructure Support (OIS) program. This work was supported by Health Data Research UK, which is funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation and Wellcome. This research was supported by an HDRUK Director’s Innovation Award (HDRUK2022.0130). We acknowledge B. Sun and T. Jiang for previous analyses of INTERVAL SomaScan and genotype QC, respectively. For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) licence to any author accepted manuscript version arising from this submission. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, NHSBT or the Department of Health and Social Care.

Author information

Authors and Affiliations

Authors

Contributions

Y.X. and M.I. conceived and designed the study. Y.X. and S.C.R. performed the genetic-score training and internal validation analyses. Y.X., Y.L., P.R.H.J.T., M.P., U.A.T., S.M.-W., Å.J., P.S. and S.D. performed the external validation analyses. S.C.R., A.P.N., E.P., J.E.P., C.O.-W. and B.P. performed the data QC and GWAS in INTERVAL. Y.X. and L.L. performed the methods benchmarking analyses. Y.X. and S.A.L. performed the PheWAS. Y.X. performed the pathway coverage, correlation and PCA analyses. Y.X. and S.C.R. performed the cross-platform validation analyses. S.C.R. performed the genetic-score polygenicity analysis. C.F. and M.I. interpreted the biological insights. Y.X. developed and maintained the online portal. M.I., A.S.B., J.F.W., C.L., X.S., J.D., R.E.G., D.S.P., E.E.D., R.M.v.D., E.S.T., E.D.A., N.S., L.B. and J.L. acquired the resources and datasets; M.I., A.S.B., J.F.W., C.L., X.S., J.D., A.M., R.E.G., C.Y., D.S.P., E.E.D., H.P. and N.P. supervised the work. Y.X. and M.I. wrote the original manuscript. All authors reviewed and approved the final paper.

Corresponding authors

Correspondence to Yu Xu or Michael Inouye.

Ethics declarations

Competing interests

During the drafting of the manuscript, P.R.H.J.T. became a part-time employee of BioAge Labs, P.S. became a full-time employee of GSK and D.S.P. became a full-time employee of AstraZeneca. L.B. is an employee of BioMarin. J.D. serves on scientific advisory boards for AstraZeneca, Novartis and UK Biobank, and has received multiple grants from academic, charitable and industry sources outside of the submitted work. A.M. is an employee of Pfizer. A.S.B. reports institutional grants from AstraZeneca, Bayer, Biogen, BioMarin, Bioverativ, Novartis, Regeneron and Sanofi.

Peer review

Peer review information

Nature thanks Heiko Runz, Bjarni Vilhjálmsson and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Schematic framework for the development and validation of multi-omic genetic scores.

This figure presents the overall study design for the development of genetic scores for multi-omic traits across five platforms (Nightingale, Metabolon, Olink, SomaScan and RNA-seq) using INTERVAL data as well as their validation in seven external cohorts of multiple ancestries (European, Asian-Chinese, Asian-Malay, Asian-Indian and African American).

Extended Data Fig. 2 R2 performance comparison between Bayesian ridge, LDpred2 and P+T for Metabolon traits in external validation (INTERVAL withheld set).

This figure compares the R2 performance between BR (on the set of genome-wide variants with p-value < 5 × 10−8; x-axis) and LDpred2 (Hapmap3 variant set), and between BR and P+T (variant sets of two p-value thresholds: 5 × 10−8 and 1×10−3) for 20 randomly selected Metabolon traits in external validation (INTERVAL withheld set; Methods). P-values in the GWAS for omic traits were derived by t-test in linear regression and all tests were two-sided.

Extended Data Fig. 3 Distribution of the number of variants in the genetic scores and the correlations between performance (R2) of genetic scores and the number of variants comprising the score.

The density plots show the distribution of the number of variants comprising the genetic scores at each platform. The scatter plots show the change of R2 score in the internal validation by the number of variants in the genetic-score model.

Extended Data Fig. 4 Validation of genetic scores in external European cohorts.

The scatter plots compare the spearman correlation scores between internal validation and external validation with a European cohort on each platform, in which points are coloured by the variant missingness rate in the external cohort and the blue line shows the linear models fitting the data points. This analysis included all the developed genetic scores in this study.

Extended Data Fig. 5 Validation of the performance change of genetic scores by their variant missing rates in external cohorts of different ancestries.

External validation results in European cohorts were merged in each platform to increase the statistical power in this analysis, which include NSPHS and ORCADES validations for Olink, and ORCADES and VIKINGS validations for Nightingale. Note that INTERVAL withheld subset validations and UKB validation for Nightingale traits were excluded in this analysis due to there is no or nearly no variant missingness in the external cohort (or INTERVAL withheld subset). Validation results in each platform were ranked by their variant missing rate of genetic-score models in the external cohort and grouped into tertiles, where variant missing rate is the number of variants missing in the validation cohort / the total number of variants in the genetic score. This figure presents the mean and standard error (SE) of R2 performance change of genetic scores between internal and external validation across tertiles of validation results. The analysis included validation results of 2,129 SomaScan, 603 Olink, 455 Metabolon and 423 Nightingale traits (traits can be overlapped for the same platform across multiple validation cohorts) for European (EUR); 2,047 SomaScan and 139 Nightingale traits for Chinese (CN), Indian (IN) and Malay (MA); 820 SomaScan traits for African American (AF).

Extended Data Fig. 6 Performance (R2) of genetic scores for Nightingale and SomaScan in external cohorts of various ancestries relative to R2 in internal validation (INTERVAL).

a, Nightingale; b, SomaScan. Transferability was only tested if the genetic score had a significant (two-sided t-test; Bonferroni corrected p-value < 0.05 for all the 17,227 omic traits tested) association with the directly measured molecular trait in internal validation (n = 1631, 7471, 964, 635 and 827 for Metabolon, Nightingale, Olink, SomaScan and RNA-seq traits, respectively). This resulted in 137, 136 Nightingale metabolic traits for UKB (n = 98,245 participants) and MEC (Chinese, n = 1,067; Indian, n = 654; Malay, n = 634) respectively and 949, 1052, 378 SomaScan proteins for FENLAND (n = 8,832), MEC (Chinese, n = 645; Indian, n = 564; Malay, n = 563) and JHS (n = 1,852). Violin plots show distributions of the ratio of R2 values. Black points show mean values and error bars are standard errors.

Extended Data Fig. 7 Performance (R2) of genetic scores between longitudinal samples and across ancestries in the MEC cohort.

Paired samples include a baseline and a revisit sample from each individual run on SomaScan and Nightingale for MEC Chinese (N = 403 and 721 individuals), MEC Indian (N = 356 and 376) and MEC Malay (N = 353 and 363). Blue lines denote linear models fitted to each set of data points and the shaded areas represent 95% confidence intervals where applicable. There is no Nightingale genetic scores with a R2 > 0.15 in both internal and MEC validation, so ac only show R2 in the range of [0, 0.15] for clarity. The sub-box plots at the right bottom of df show the validation results of these traits with baseline validation performance (R2) between 0 and 0.025 in each ancestry.

Extended Data Fig. 8 Coverage analysis for blood proteins in the lowest-level pathways.

This analysis looked at all the lowest-level pathways of super-pathways curated at Reactome. Where at least one protein genetic score are included in the entities of a lowest-level pathway, we consider this pathway is covered by proteins of this study. This figure shows the percentage of the lowest-level pathways a group of proteins (by R2 in internal validation) covered among all the lowest-level pathways of each super-pathway.

Extended Data Fig. 9 Key features of the OmicsPred portal for accessing genetic scores of multi-omic traits.

a, Organization of genetic scores on the portal. b, Example of how biomolecular traits and their genetic-score-related information can be explored. c, Example of how summary statistics of training and validation cohorts are presented. d, Example of how validation results and genetic-score models can be downloaded. e, Example of how validation results and trait-related information can be visualized.

Extended Data Table 1 Demographic statistics of training and validation samples for the construction of genetic scores of blood biomolecular traits by platform

Supplementary information

Supplementary Information

This file contains Supplementary Figures 1–30 and Supplementary Table legends.

Reporting Summary

Peer Review File

Supplementary Tables

This file contains Supplementary Tables 1–11.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, Y., Ritchie, S.C., Liang, Y. et al. An atlas of genetic scores to predict multi-omic traits. Nature 616, 123–131 (2023). https://doi.org/10.1038/s41586-023-05844-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41586-023-05844-9

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research