Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Genetics meets proteomics: perspectives for large population-based studies

Abstract

Proteomic analysis of cells, tissues and body fluids has generated valuable insights into the complex processes influencing human biology. Proteins represent intermediate phenotypes for disease and provide insight into how genetic and non-genetic risk factors are mechanistically linked to clinical outcomes. Associations between protein levels and DNA sequence variants that colocalize with risk alleles for common diseases can expose disease-associated pathways, revealing novel drug targets and translational biomarkers. However, genome-wide, population-scale analyses of proteomic data are only now emerging. Here, we review current findings from studies of the plasma proteome and discuss their potential for advancing biomedical translation through the interpretation of genome-wide association analyses. We highlight the challenges faced by currently available technologies and provide perspectives relevant to their future application in large-scale biobank studies.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: Key concepts of pQTL studies.
Fig. 2: Ways a genetic variant can lead to a pQTL.

References

  1. MacArthur, J. et al. The new NHGRI-EBI catalog of published genome-wide association studies (GWAS catalog). Nucleic Acids Res. 45, D896–D901 (2017).

    CAS  PubMed  Google Scholar 

  2. Lonsdale, J. et al. The genotype-tissue expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).

    CAS  Google Scholar 

  3. Suhre, K. et al. Human metabolic individuality in biomedical and pharmaceutical research. Nature 477, 54–60 (2011).

    CAS  PubMed  Google Scholar 

  4. Kastenmuller, G., Raffler, J., Gieger, C. & Suhre, K. Genetics of human metabolism: an update. Hum. Mol. Genet. 24, R93–R101 (2015).

    PubMed  PubMed Central  Google Scholar 

  5. Anderson, N. L. & Anderson, N. G. The human plasma proteome: history, character, and diagnostic prospects. Mol. Cell. Proteomics 1, 845–867 (2002).

    CAS  PubMed  Google Scholar 

  6. Melzer, D. et al. A genome-wide association study identifies protein quantitative trait loci (pQTLs). PLoS Genet. 4, e1000072 (2008).

    PubMed  PubMed Central  Google Scholar 

  7. Plenge, R. M., Scolnick, E. M. & Altshuler, D. Validating therapeutic targets through human genetics. Nat. Rev. Drug Discov. 12, 581–594 (2013).

    CAS  PubMed  Google Scholar 

  8. Suhre, K. et al. Connecting genetic risk to disease end points through the human blood plasma proteome. Nat. Commun. 8, 14357 (2017). This is one of the first GWAS using the SomaScan platform for 1,100 proteins.

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Emilsson, V. et al. Co-regulatory networks of human serum proteins link genetics to disease. Science 361, 769–773 (2018). This is currently the largest GWAS using the updated SomaScan platform for 4,000 proteins and 4,000 samples.

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Sun, B. B. et al. Genomic atlas of the human plasma proteome. Nature 558, 73–79 (2018). This is a recent GWAS using the SomaScan platform with 3,000 proteins on 3,000 samples.

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Benson, M. D. et al. Genetic architecture of the cardiovascular risk proteome. Circulation 137, 1158–1172 (2018).

    PubMed  Google Scholar 

  12. Zhernakova, D. V. et al. Individual variations in cardiovascular-disease-related protein levels are driven by genetics and gut microbiome. Nat. Genet. 50, 1524–1532 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. Yao, C. et al. Genome-wide mapping of plasma protein QTLs identifies putatively causal genes and pathways for cardiovascular disease. Nat. Commun. 9, 3268 (2018).

    PubMed  PubMed Central  Google Scholar 

  14. Enroth, S., Johansson, A., Enroth, S. B. & Gyllensten, U. Strong effects of genetic and lifestyle factors on biomarker variation and use of personalized cutoffs. Nat. Commun. 5, 4684 (2014). This is an early GWAS using the Olink platform; the study highlights the potential impact of epitope effects on protein readouts.

    CAS  PubMed  Google Scholar 

  15. Lourdusamy, A. et al. Identification of cis-regulatory variation influencing protein abundance levels in human plasma. Hum. Mol. Genet. 21, 3719–26 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Sasayama, D. et al. Genome-wide quantitative trait loci mapping of the human cerebrospinal fluid proteome. Hum. Mol. Genet. 26, 44–51 (2017).

    CAS  PubMed  Google Scholar 

  17. Sun, W. et al. Common genetic polymorphisms influence blood biomarker measurements in COPD. PLoS Genet. 12, e1006011 (2016).

    PubMed  PubMed Central  Google Scholar 

  18. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018). This study highlights the potential of large biobanks.

    CAS  PubMed  PubMed Central  Google Scholar 

  19. German National Cohort (GNC) Consortium. The German National Cohort: aims, study design and organization. Eur. J. Epidemiol. 29, 371–82 (2014).

    Google Scholar 

  20. Precision Medicine Initiative (PMI) Working Group Report to the Advisory Committee to the Director, NIH. The Precision Medicine Initiative Cohort Program – Building a Research Foundation for 21st Century Medicine (National Institutes of Health, 2015).

  21. Chen, Z. et al. China Kadoorie Biobank of 0.5 million people: survey methods, baseline characteristics and long-term follow-up. Int. J. Epidemiol. 40, 1652–1666 (2011).

    PubMed  PubMed Central  Google Scholar 

  22. Omenn, G. S. et al. Progress on identifying and characterizing the human proteome: 2018 metrics from the HUPO Human Proteome Project. J. Proteome Res. 17, 4031–4041 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Baker, M. S. et al. Accelerating the search for the missing proteins in the human proteome. Nat. Commun. 8, 14271 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Aebersold, R. & Mann, M. Mass spectrometry-based proteomics. Nature 422, 198–207 (2003).

    CAS  PubMed  Google Scholar 

  25. Stoevesandt, O. & Taussig, M. J. Affinity proteomics: the role of specific binding reagents in human proteome analysis. Expert. Rev. Proteom. 9, 401–14 (2012).

    CAS  Google Scholar 

  26. Smith, J. G. & Gerszten, R. E. Emerging affinity-based proteomic technologies for large-scale plasma profiling in cardiovascular disease. Circulation 135, 1651–1664 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Timp, W. & Timp, G. Beyond mass spectrometry, the next step in proteomics. Sci. Adv. 6, eaax8978 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Kim, M. S. et al. A draft map of the human proteome. Nature 509, 575–81 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. Wilhelm, M. et al. Mass-spectrometry-based draft of the human proteome. Nature 509, 582–587 (2014).

    CAS  PubMed  Google Scholar 

  30. Uhlen, M. et al. Tissue-based map of the human proteome. Science 347, 1260419 (2015).

    PubMed  Google Scholar 

  31. Uhlen, M. et al. A genome-wide transcriptomic analysis of protein-coding genes in human blood cells. Science 366, eaax9198 (2019).

    CAS  PubMed  Google Scholar 

  32. Thul, P. J. et al. A subcellular map of the human proteome. Science 356, eaal3321 (2017).

    PubMed  Google Scholar 

  33. Schwenk, J. M. et al. The human plasma proteome draft of 2017: building on the Human Plasma PeptideAtlas from mass spectrometry and complementary assays. J. Proteome Res. 16, 4299–4310 (2017). This article reviews recent advances in plasma proteomics and uses data from the community to summarize the circulating proteins detected by MS.

    CAS  PubMed  PubMed Central  Google Scholar 

  34. Pernemalm, M. et al. In-depth human plasma proteome analysis captures tissue proteins and transfer of protein variants across the placenta. Elife 8, e41608 (2019).

    PubMed  PubMed Central  Google Scholar 

  35. Uhlen, M. et al. The human secretome. Sci Signal 12, eaaz0274 (2019). This article reviews the actively secreted proteins of the human proteome for their destination and reveals that only approximately 730 proteins are secreted into the circulation.

    CAS  PubMed  Google Scholar 

  36. Geyer, P. E. et al. Plasma proteome profiling to detect and avoid sample-related biases in biomarker studies. EMBO Mol. Med. 11, e10427 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. Aebersold, R. & Mann, M. Mass-spectrometric exploration of proteome structure and function. Nature 537, 347–55 (2016).

    CAS  PubMed  Google Scholar 

  38. Marx, V. A dream of single-cell proteomics. Nat. Methods 16, 809–812 (2019).

    CAS  PubMed  Google Scholar 

  39. Aebersold, R. et al. How many human proteoforms are there? Nat. Chem. Biol. 14, 206–214 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Theodoratou, E. et al. The role of glycosylation in IBD. Nat. Rev. Gastroenterol. Hepatol. 11, 588–600 (2014).

    CAS  PubMed  Google Scholar 

  41. Ignjatovic, V. et al. Mass spectrometry-based plasma proteomics: considerations from sample collection to achieving translational data. J. Proteome. Res. 18, 4085–497 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. Enroth, S., Hallmans, G., Grankvist, K. & Gyllensten, U. Effects of long-term storage time and original sampling month on biobank plasma protein concentrations. EBioMedicine 12, 309–314 (2016).

    PubMed  PubMed Central  Google Scholar 

  43. Kofanova, O. et al. IL8 and IL16 levels indicate serum and plasma quality. Clin. Chem. Lab. Med. 56, 1054–1062 (2018).

    CAS  PubMed  Google Scholar 

  44. Qundos, U. et al. Profiling post-centrifugation delay of serum and plasma with antibody bead arrays. J. Proteom. 95, 46–54 (2013).

    CAS  Google Scholar 

  45. Daniels, J. R. et al. Stability of the human plasma proteome to pre-analytical variability as assessed by an aptamer-based approach. J. Proteome. Res. 18, 3661–3670 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. Kim, C. H. et al. Stability and reproducibility of proteomic profiles measured with an aptamer-based platform. Sci. Rep. 8, 8382 (2018).

    PubMed  PubMed Central  Google Scholar 

  47. Shen, Q. et al. Strong impact on plasma protein profiles by precentrifugation delay but not by repeated freeze-thaw cycles, as analyzed using multiplex proximity extension assays. Clin. Chem. Lab. Med. 56, 582–594 (2018).

    CAS  PubMed  Google Scholar 

  48. Di Girolamo, F., Alessandroni, J., Somma, P. & Guadagni, F. Pre-analytical operating procedures for serum low molecular Weight protein profiling. J. Proteom. 73, 667–77 (2010).

    Google Scholar 

  49. Zimmerman, L. J., Li, M., Yarbrough, W. G., Slebos, R. J. & Liebler, D. C. Global stability of plasma proteomes for mass spectrometry-based analyses. Mol. Cell. Proteomics 11, M111.014340 (2012).

    PubMed  PubMed Central  Google Scholar 

  50. Shen, Y. et al. Characterization of the human blood plasma proteome. Proteomics 5, 4034–45 (2005).

    CAS  PubMed  Google Scholar 

  51. Abbatiello, S. E. et al. Large-scale interlaboratory study to develop, analytically validate and apply highly multiplexed, quantitative peptide assays to measure cancer-relevant proteins in plasma. Mol. Cell. Proteomics 14, 2357–74 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  52. Harney, D. J. et al. Small-protein enrichment assay enables the rapid, unbiased analysis of over 100 low abundance factors from human plasma. Mol. Cell. Proteomics 18, 1899–1915 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. Johansson, A. et al. Identification of genetic variants influencing the human plasma proteome. Proc. Natl Acad. Sci. USA 110, 4673–8 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. Geyer, P. E., Holdt, L. M., Teupser, D. & Mann, M. Revisiting biomarker discovery by plasma proteomics. Mol. Syst. Biol. 13, 942 (2017).

    PubMed  PubMed Central  Google Scholar 

  55. Keshishian, H. et al. Multiplexed, quantitative workflow for sensitive biomarker discovery in plasma yields novel candidates for early myocardial injury. Mol. Cell. Proteomics 14, 2375–93 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  56. Ludwig, C. et al. Data-independent acquisition-based SWATH-MS for quantitative proteomics: a tutorial. Mol. Syst. Biol. 14, e8126 (2018).

    PubMed  PubMed Central  Google Scholar 

  57. Doerr, A. Mass spectrometry-based targeted proteomics. Nat. Methods 10, 23 (2013).

    PubMed  Google Scholar 

  58. Geyer, P. E. et al. Plasma proteome profiling to assess human health and disease. Cell Syst. 2, 185–95 (2016).

    CAS  PubMed  Google Scholar 

  59. Geyer, P. E. et al. Proteomics reveals the effects of sustained weight loss on the human plasma proteome. Mol. Syst. Biol. 12, 901 (2016).

    PubMed  PubMed Central  Google Scholar 

  60. Liu, Y. et al. Quantitative variability of 342 plasma proteins in a human twin population. Mol. Syst. Biol. 11, 786 (2015).

    PubMed  PubMed Central  Google Scholar 

  61. Rosenberger, G. et al. Inference and quantification of peptidoforms in large sample cohorts by SWATH-MS. Nat. Biotechnol. 35, 781–788 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  62. Bruderer, R. et al. Analysis of 1508 plasma samples by capillary-flow data-independent acquisition profiles proteomics of weight loss and maintenance. Mol. Cell. Proteomics 18, 1242–1254 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  63. Addona, T. A. et al. Multi-site assessment of the precision and reproducibility of multiple reaction monitoring-based measurements of proteins in plasma. Nat. Biotechnol. 27, 633–41 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  64. Percy, A. J. et al. Method and platform standardization in MRM-based quantitative plasma proteomics. J. Proteom. 95, 66–76 (2013).

    CAS  Google Scholar 

  65. Stoevesandt, O. & Taussig, M. J. Affinity reagent resources for human proteome detection: initiatives and perspectives. Proteomics 7, 2738–50 (2007).

    CAS  PubMed  Google Scholar 

  66. Ekins, R. P. Multi-analyte immunoassay. J. Pharm. Biomed. Anal. 7, 155–68 (1989).

    CAS  PubMed  Google Scholar 

  67. Ayoglu, B. et al. Systematic antibody and antigen-based proteomic profiling with microarrays. Expert Rev. Mol. Diagn. 11, 219–34 (2011).

    CAS  PubMed  Google Scholar 

  68. Rissin, D. M. et al. Single-molecule enzyme-linked immunosorbent assay detects serum proteins at subfemtomolar concentrations. Nat. Biotechnol. 28, 595–9 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  69. Fulton, R. J., McDade, R. L., Smith, P. L., Kienker, L. J. & Kettman, J. R. Jr. Advanced multiplexed analysis with the FlowMetrix system. Clin. Chem. 43, 1749–56 (1997).

    CAS  PubMed  Google Scholar 

  70. Ahola-Olli, A. V. et al. Genome-wide association study identifies 27 loci influencing concentrations of circulating cytokines and growth factors. Am. J. Hum. Genet. 100, 40–50 (2017).

    CAS  PubMed  Google Scholar 

  71. Fredolini, C. et al. Immunocapture strategies in translational proteomics. Expert Rev. Proteom. 13, 83–98 (2016).

    CAS  Google Scholar 

  72. Assarsson, E. et al. Homogenous 96-plex PEA immunoassay exhibiting high sensitivity, specificity, and excellent scalability. PLoS ONE 9, e95192 (2014).

    PubMed  PubMed Central  Google Scholar 

  73. Folkersen, L. et al. Mapping of 79 loci for 83 plasma protein biomarkers in cardiovascular disease. PLoS Genet. 13, e1006706 (2017).

    PubMed  PubMed Central  Google Scholar 

  74. Folkersen, L. et al. Genomic evaluation of circulating proteins for drug target characterisation and precision medicine. Preprint at bioRxiv https://doi.org/10.1101/2020.04.03.023804 (2020). This is currently one of the largest pQTL studies, with more than 21,000 samples on a 92-protein panel from the Olink platform.

    Article  Google Scholar 

  75. Gold, L. et al. Aptamer-based multiplexed proteomic technology for biomarker discovery. PLoS ONE 5, e15004 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  76. Williams, S. A. et al. Plasma protein patterns as comprehensive indicators of health. Nat. Med. 25, 1851–1857 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  77. Lam, M. P. et al. Data-driven approach to determine popular proteins for targeted proteomics translation of six organ systems. J. Proteome Res. 15, 4126–4134 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  78. Colwill, K. & Graslund, S. A roadmap to generate renewable protein binders to the human proteome. Nat. Methods 8, 551–8 (2011).

    CAS  PubMed  Google Scholar 

  79. Baker, M. Reproducibility crisis: blame it on the antibodies. Nature 521, 274–6 (2015).

    CAS  PubMed  Google Scholar 

  80. Uhlen, M. et al. A proposal for validation of antibodies. Nat. Methods 13, 823–7 (2016).

    CAS  PubMed  Google Scholar 

  81. Fredolini, C. et al. Systematic assessment of antibody selectivity in plasma based on a resource of enrichment profiles. Sci. Rep. 9, 8324 (2019).

    PubMed  PubMed Central  Google Scholar 

  82. Edfors, F. et al. Enhanced validation of antibodies for research applications. Nat. Commun. 9, 4130 (2018).

    PubMed  PubMed Central  Google Scholar 

  83. Aulchenko, Y. S., Ripke, S., Isaacs, A. & van Duijn, C. M. GenABEL: an R library for genome-wide association analysis. Bioinformatics 23, 1294–6 (2007).

    CAS  PubMed  Google Scholar 

  84. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–75 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  85. Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–7 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  86. Ruffieux, H., Davison, A. C., Hager, J. & Irincheeva, I. Efficient inference for genetic association studies with multiple outcomes. Biostatistics 18, 618–636 (2017).

    PubMed  Google Scholar 

  87. Ahsan, M. et al. The relative contribution of DNA methylation and genetic variants on protein biomarkers for human diseases. PLOS Genet. 13, e1007005 (2017).

    PubMed  PubMed Central  Google Scholar 

  88. de Vries, P. S. et al. Whole-genome sequencing study of serum peptide levels: the Atherosclerosis Risk in Communities study. Hum. Mol. Genet. 26, 3442–3450 (2017).

    PubMed  PubMed Central  Google Scholar 

  89. Graumann, J. et al. Multi-platform affinity proteomics identify proteins linked to metastasis and immune suppression in ovarian cancer plasma. Front. Oncol. 9, 1150 (2019).

    PubMed  PubMed Central  Google Scholar 

  90. Billing, A. M. et al. Complementarity of SOMAscan to LC-MS/MS and RNA-seq for quantitative profiling of human embryonic and mesenchymal stem cells. J. Proteom. 150, 86–97 (2017).

    CAS  Google Scholar 

  91. Ruffieux, H. et al. A Bayesian joint pQTL study sheds light on the genetic architecture of obesity. Preprint at bioRxiv https://doi.org/10.1101/524405 (2019).

    Article  Google Scholar 

  92. Freedman, M. L. et al. Principles for the post-GWAS functional characterization of cancer risk loci. Nat. Genet. 43, 513–8 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  93. Gamazon, E. R. et al. Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation. Nat. Genet. 50, 956–967 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  94. Wainberg, M. et al. Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 51, 592–599 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  95. Nieuwenhuis, T. O. et al. Consistent RNA sequencing contamination in GTEx and other data sets. Nat. Commun. 11, 1933 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  96. Zheng, J. et al. Phenome-wide Mendelian randomization mapping the influence of the plasma proteome on complex diseases. Preprint at bioRxiv https://doi.org/10.1101/627398 (2019).

    Article  Google Scholar 

  97. Hemani, G. et al. The MR-Base platform supports systematic causal inference across the human phenome. Elife 7, e34408 (2018).

    PubMed  PubMed Central  Google Scholar 

  98. Petersen, A. K. et al. On the hypothesis-free testing of metabolite ratios in genome-wide and metabolome-wide association studies. BMC Bioinformatics 13, 120 (2012).

    PubMed  PubMed Central  Google Scholar 

  99. Slenter, D. N. et al. WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research. Nucleic Acids Res. 46, D661–D667 (2018).

    CAS  PubMed  Google Scholar 

  100. Szklarczyk, D. et al. The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res. 45, D362–D368 (2017).

    CAS  PubMed  Google Scholar 

  101. Krumsiek, J., Suhre, K., Illig, T., Adamski, J. & Theis, F. J. Gaussian graphical modeling reconstructs pathway reactions from high-throughput metabolomics data. BMC Syst. Biol. 5, 21 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  102. Shin, S. Y. et al. An atlas of genetic influences on human blood metabolites. Nat. Genet. 46, 543–550 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  103. van der Harst, P. & Verweij, N. Identification of 64 novel genetic loci provides an expanded view on the genetic architecture of coronary artery disease. Circ. Res. 122, 433–443 (2018).

    PubMed  PubMed Central  Google Scholar 

  104. Klarin, D., Emdin, C. A., Natarajan, P., Conrad, M. F. & Kathiresan, S. Genetic analysis of venous thromboembolism in UK Biobank identifies the ZFPM2 locus and implicates obesity as a causal risk factor. Circ. Cardiovasc. Genet. 10, e001643 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  105. Nath, A. P. et al. Multivariate genome-wide association analysis of a cytokine network reveals variants with widespread immune, haematological, and cardiometabolic pleiotropy. Am. J. Hum. Genet. 105, 1076–1090 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  106. Do, K. T., Rasp, D. J. N., Kastenmuller, G., Suhre, K. & Krumsiek, J. MoDentify: phenotype-driven module identification in metabolomics networks at different resolutions. Bioinformatics 35, 532–534 (2019).

    CAS  PubMed  Google Scholar 

  107. Zhang, B. et al. Proteogenomic characterization of human colon and rectal cancer. Nature 513, 382–7 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  108. Nesvizhskii, A. I. Proteogenomics: concepts, applications and computational strategies. Nat. Methods 11, 1114–25 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  109. Ting, Y. S. et al. PECAN: library-free peptide detection for data-independent acquisition tandem mass spectrometry data. Nat. Methods 14, 903–908 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  110. Harper, S. C. et al. Is growth differentiation factor 11 a realistic therapeutic for aging-dependent muscle defects? Circ. Res. 118, 1143–50 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  111. SomaLogic. Short Technical Note: Characterization of the Binding Specificity of SOMAmer Reagents in the SomaScan Assay (2019).

  112. Ganz, P. et al. Development and validation of a protein-based risk score for cardiovascular outcomes among patients with stable coronary heart disease. JAMA 315, 2532–41 (2016).

    CAS  PubMed  Google Scholar 

  113. Anderson, N. L. The clinical plasma proteome: a survey of clinical assays for proteins in plasma and serum. Clin. Chem. 56, 177–85 (2010). This is an early survey that lists the FDA-approved plasma biomarkers (an update of this list is provided in Supplementary Table 1).

    CAS  PubMed  Google Scholar 

  114. Sjaarda, J. et al. Influence of genetic ancestry on human serum proteome. Am. J. Hum. Genet. 106, 303–314 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  115. Staley, J. R. et al. PhenoScanner: a database of human genotype-phenotype associations. Bioinformatics 32, 3207–3209 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  116. Arnold, M., Raffler, J., Pfeufer, A., Suhre, K. & Kastenmuller, G. SNiPA: an interactive, genetic variant-centered annotation browser. Bioinformatics 31, 1334–6 (2015).

    PubMed  Google Scholar 

  117. He, X. et al. Sherlock: detecting gene-disease associations by matching patterns of expression QTL and GWAS. Am. J. Hum. Genet. 92, 667–80 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  118. Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  119. Richardson, T. G., Harrison, S., Hemani, G. & Davey Smith, G. An atlas of polygenic risk score associations to highlight putative causal relationships across the human phenome. Elife 8, e43657 (2019).

    PubMed  PubMed Central  Google Scholar 

  120. Mosley, J. D. et al. Probing the virtual proteome to identify novel disease biomarkers. Circulation 138, 2469–2481 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  121. Udler, M. S. et al. Type 2 diabetes genetic loci informed by multi-trait associations point to disease mechanisms and subtypes: A soft clustering analysis. PLoS Med. 15, e1002654 (2018).

    PubMed  PubMed Central  Google Scholar 

  122. Plump, A. & Davey Smith, G. Identifying and validating new drug targets for stroke and beyond. Circulation 140, 831–835 (2019).

    CAS  PubMed  Google Scholar 

  123. Chong, M. et al. Novel drug targets for ischemic stroke identified through mendelian randomization analysis of the blood proteome. Circulation 140, 819–830 (2019).

    CAS  PubMed  Google Scholar 

  124. Hillary, R. F. et al. Genome and epigenome wide studies of neurological protein biomarkers in the Lothian Birth Cohort 1936. Nat. Commun. 10, 3160 (2019).

    PubMed  PubMed Central  Google Scholar 

  125. Shen, X. et al. Multivariate discovery and replication of five novel loci associated with immunoglobulin G N-glycosylation. Nat. Commun. 8, 447 (2017).

    PubMed  PubMed Central  Google Scholar 

  126. Sharapov, S. Z. et al. Defining the genetic control of human blood plasma N-glycome using genome-wide association study. Hum. Mol. Genet. 28, 2062–2077 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  127. Lin, Y. H., Zhu, J., Meijer, S., Franc, V. & Heck, A. J. R. Glycoproteogenomics: a frequent gene polymorphism affects the glycosylation pattern of the human serum fetuin/alpha-2-HS-glycoprotein. Mol. Cell. Proteomics 18, 1479–1490 (2019).

    PubMed  PubMed Central  Google Scholar 

  128. Zaghlool, S. B. et al. Epigenetics meets proteomics in an epigenome-wide association study with circulating blood plasma protein traits. Nat. Commun. 11, 15 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  129. Huan, T. et al. Genome-wide identification of DNA methylation QTLs in whole blood highlights pathways for cardiovascular disease. Nat. Commun. 10, 4267 (2019).

    PubMed  PubMed Central  Google Scholar 

  130. Zaghlool, S. B. et al. Deep molecular phenotypes link complex disorders and physiological insult to CpG methylation. Hum. Mol. Genet. 27, 1106–1121 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  131. Suhre, K. et al. Fine-mapping of the human blood plasma n-glycome onto its proteome. Metabolites 9 (2019).

  132. Gudmundsdottir, V. et al. Circulating protein signatures and causal candidates for type 2 diabetes. Diabetes https://doi.org/10.2337/db19-1070 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  133. Lehallier, B. et al. Undulating changes in human plasma proteome profiles across the lifespan. Nat. Med. 25, 1843–1850 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  134. Kim, S. et al. Influence of genetic variation on plasma protein levels in older adults using a multi-analyte panel. PLoS ONE 8, e70269 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  135. Kauwe, J. S. et al. Genome-wide association study of CSF levels of 59 Alzheimer’s disease candidate proteins: significant associations with proteins involved in amyloid processing and inflammation. PLoS Genet. 10, e1004758 (2014).

    PubMed  PubMed Central  Google Scholar 

  136. Deming, Y. et al. Genetic studies of plasma analytes identify novel potential biomarkers for several complex traits. Sci. Rep. 6, 18092 (2016).

    CAS  PubMed Central  Google Scholar 

  137. Solomon, T. et al. Associations between common and rare exonic genetic variants and serum levels of 20 cardiovascular-related proteins: the Tromso study. Circ. Cardiovasc. Genet. 9, 375–83 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  138. Di Narzo, A. F. et al. High-throughput characterization of blood serum proteomics of ibd patients with respect to aging and genetic factors. PLoS Genet. 13, e1006565 (2017).

    PubMed  PubMed Central  Google Scholar 

  139. Carayol, J. et al. Protein quantitative trait locus study in obesity during weight-loss identifies a leptin regulator. Nat. Commun. 8, 2084 (2017).

    PubMed  PubMed Central  Google Scholar 

  140. Solomon, T. et al. Identification of common and rare genetic variation associated with plasma protein levels using whole-exome sequencing and mass spectrometry. Circ. Genom. Precis. Med. 11, e002170 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  141. Sliz, E. et al. Genome-wide association study identifies seven novel loci associating with circulating cytokines and cell adhesion molecules in Finns. J. Med. Genet. 56, 607–616 (2019).

    CAS  PubMed  Google Scholar 

  142. Gilly, A. et al. Whole genome sequencing analysis of the cardiometabolic proteome. Preprint at bioRxiv https://doi.org/10.1101/854752 (2020).

    Article  Google Scholar 

  143. Orru, V. et al. Genetic variants regulating immune cell levels in health and disease. Cell 155, 242–56 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  144. Patin, E. et al. Natural variation in the parameters of innate immune cells is preferentially driven by genetic factors. Nat. Immunol. 19, 302–314 (2018).

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

K.S. is supported by the Biomedical Research Program at Weill Cornell Medicine in Qatar, a programme funded by the Qatar Foundation. J.M.S. is supported by the KTH Center for Applied Precision Medicine funded by the Erling Persson Family Foundation and acknowledges the Knut and Alice Wallenberg Foundation for funding the Human Protein Atlas. J.M.S. and M.I.M. acknowledge the Innovative Medicines Initiative Joint Undertaking under grant agreement no. 115317 (DIRECT), the resources of which are composed of a financial contribution from the European Union’s Seventh Framework Programme and an EFPIA companies’ in kind contribution. The views expressed in this article are those of the authors and not necessarily those of the UK NHS, the UK NIHR, the UK Department of Health or the Qatar Foundation.

Author information

Authors and Affiliations

Authors

Contributions

K.S. and J.M.S. researched data for article. All authors contributed to the discussion of content, writing the article and reviewing/editing the manuscript before submission.

Corresponding authors

Correspondence to Karsten Suhre or Jochen M. Schwenk.

Ethics declarations

Competing interests

M.I.M. has served on advisory panels for Pfizer, NovoNordisk and Zoe Global, has received honoraria from Merck, Pfizer, NovoNordisk and Eli Lilly, has stock options in Zoe Global and has received research funding from AbbVie, AstraZeneca, Boehringer Ingelheim, Eli Lilly, Janssen, Merck, NovoNordisk, Pfizer, Roche, Sanofi Aventis, Servier and Takeda. As of June 2019, M.I.M. is an employee of Genentech and holds stock in Roche. K.S. and J.M.S. declare no competing interests.

Additional information

Peer review information

Nature Reviews Genetics thanks M. Altelaar and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

A table of all published GWAS with proteomics: http://www.metabolomix.com/a-table-of-all-published-gwas-with-proteomics/

Human Plasma Proteome Project: https://www.hupo.org/plasma-proteome-project

Human Proteome Map: http://www.humanproteomemap.org

In vitro test systems that have been categorized by the FDA: https://www.fda.gov/medical-devices/medical-device-databases/clinical-laboratory-improvement-amendments-download-data

MRbase: a database and analytical platform for Mendelian randomization: http://www.mrbase.org

neXtProt, a knowledgebase on human proteins: https://www.nextprot.org

PeptideAtlas: http://www.peptideatlas.org

pGWAS server: connecting genetic risk to disease end points through the human blood plasma proteome: http://proteomics.gwas.eu

PhenoScanner: a database of human genotype-phenotype associations: http://www.phenoscanner.medschl.cam.ac.uk

ProteomeXchange, a consortium to provide coordinated data submission and dissemination of proteomics repositories: http://www.proteomexchange.org

ProteomicsDB: https://www.proteomicsdb.org

SNiPA: a tool for annotating and browsing genetic variants: http://snipa.org

SomaLogic white paper on SOMAmer specificity: https://somalogic.com/technology/our-platform/somamer-specificity/

The Genome Aggregation Database (gnomAD) browser: https://gnomad.broadinstitute.org/

The Genotype-Tissue Expression (GTEx) project portal: https://gtexportal.org

The Human Protein Atlas: https://www.proteinatlas.org

The Human Protein Atlas: the proteins actively secreted to human blood: https://www.proteinatlas.org/humanproteome/blood/secreted+to+blood

The NHGRI-EBI catalogue of published genome-wide association studies: https://www.ebi.ac.uk/gwas

Uniprot Proteomes — Homo sapiens: https://www.uniprot.org/proteomes/UP000005640

Supplementary information

Glossary

Colocalization

Two genetic associations are said to be colocalized if the strengths of their statistical associations covary at a genetic locus, suggesting a shared genetic causal variant for the observed associations.

Protein QTLs

(pQTLs). A protein quantitative trait locus (pQTL) is an association of protein levels at a genetic locus; it is often represented by the strongest associating single-nucleotide polymorphism.

pQTL studies

Genome-wide association studies where the dependent variables are the levels of proteins measured using a proteomics approach. The identified loci that associate with protein levels are termed ‘protein quantitative trait loci’ (pQTLs).

Open reading frames

Portions of DNA that can be translated into protein and that are terminated by a stop codon.

Post-translational modifications

Biochemical modification of the primary peptide sequence, typically by covalent addition of a chemical group, such as for phosphorylation and glycosylation. Post-translational modifications can change the accessibility to a protein epitope and potentially influence the binding of affinity reagents.

Data-dependent acquisition

(DDA). A data acquisition mode used in mass spectrometry analysis where only a selected set of peptides with the most intense peptide ions are being fragmented and analysed.

Data-independent acquisition

(DIA). A data acquisition mode used in mass spectrometry analysis where all peptides detected within a particular window of the mass-to-charge ratio are being fragmented and analysed.

Aptamers

Short single-stranded (and possibly modified) nucleotides that are selected from a synthetic library of sequences to recognize a specific target protein (for example, via structural elements) with high affinity.

cis-pQTLs

When a protein quantitative trait locus (pQTL) is at or near the genetic locus that encodes the associated protein; often an ad hoc distance cut-off is used to differentiate cis-pQTLs from trans-pQTLs. A cis-pQTL suggests a direct influence of a genetic variant at that locus on protein expression or turnover.

trans-pQTLs

When a protein quantitative trait locus (pQTL) is distant from the protein-coding gene or on another chromosome. A trans-pQTL indicates an indirect link between the genetic locus and protein expression or turnover.

Linkage disequilibrium

Two genetic loci are in linkage disequilibrium if their genotypes correlate within a population. Lack of recombination between loci results in them commonly being co-inherited as a haplotype.

Mendelian randomization

A method to estimate the unconfounded effect of an exposure (for example, protein level) on an outcome (for example, disease risk) using genetic variation.

Gaussian graphical models

(GGMs). Network representations of the partial correlations between a set of quantitative variables, here the protein levels. Partial correlations used in a protein GGM can be viewed as the amount of pairwise correlation between the levels of two proteins that remains when the contributions of all other proteins are accounted for.

Pleiotropic

A genetic locus is pleiotropic when one or more of its variants is associated with two or more seemingly unrelated phenotypic traits.

Epitope effect

An effect of an epitope-changing variant on the binding properties of affinity reagents with regard to their antigens. A difference in reported antigen recognition may be mistaken for a difference in protein abundance.

Polygenic risk scores

Combined risk scores derived from a weighed combination of genetic associations, possibly including millions of associations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Suhre, K., McCarthy, M.I. & Schwenk, J.M. Genetics meets proteomics: perspectives for large population-based studies. Nat Rev Genet 22, 19–37 (2021). https://doi.org/10.1038/s41576-020-0268-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41576-020-0268-2

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing