Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Resource
  • Published:

Resource profile and user guide of the Polygenic Index Repository

Abstract

Polygenic indexes (PGIs) are DNA-based predictors. Their value for research in many scientific disciplines is growing rapidly. As a resource for researchers, we used a consistent methodology to construct PGIs for 47 phenotypes in 11 datasets. To maximize the PGIs’ prediction accuracies, we constructed them using genome-wide association studies—some not previously published—from multiple data sources, including 23andMe and UK Biobank. We present a theoretical framework to help interpret analyses involving PGIs. A key insight is that a PGI can be understood as an unbiased but noisy measure of a latent variable we call the ‘additive SNP factor’. Regressions in which the true regressor is this factor but the PGI is used as its proxy therefore suffer from errors-in-variables bias. We derive an estimator that corrects for the bias, illustrate the correction, and make a Python tool for implementing it publicly available.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Type of studies in presentations at BGA Annual Meetings.
Fig. 2: Algorithm determining which single-trait and multi-trait PGIs were generated for the repository.
Fig. 3: Predictive power of repository single-trait PGIs.

Similar content being viewed by others

Data availability

For how to access the repository PGIs and other data from each participating dataset, see Supplementary Note; an up-to-date list of participating datasets and data access procedures is maintained at https://www.thessgac.org/pgi-repository. For each phenotype that we analyse, we report GWAS and MTAG summary statistics and PGI (LDpred) weights for all SNPs from the largest discovery sample for that analysis, unless the sample includes 23andMe. SNP-level summary statistics from analyses based entirely or in part on 23andMe data can only be reported for up to 10,000 SNPs. Therefore, if the largest GWAS or MTAG analysis for a phenotype includes 23andMe, we report summary statistics for only the genome-wide significant SNPs from that analysis. In addition, we report summary statistics for all SNPs from a version of the largest GWAS analysis that excludes 23andMe. Finally, we also report summary statistics and PGI (LDpred) weights on which the ‘public PGIs’ are based. These summary statistics and PGI weights can be downloaded from https://www.thessgac.org/pgi-repository. The data underlying Fig. 1 are also available at https://www.thessgac.org/pgi-repository. Researchers at non-profit institutions can obtain access to the genome-wide summary statistics from 23andMe used in this paper by completing the 23andMe publication dataset access request form, available at https://research.23andme.com/dataset-access/. Source data are provided with this paper.

Code availability

The software used for the measurement-error correction is available at https://github.com/JonJala/pgi_correct. The code for constructing PGIs and principal components, the code for the illustrative application and the code for analysing the data displayed in Fig. 1 are at https://www.thessgac.org/pgi-repository.

References

  1. Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Wray, N. R., Goddard, M. E. & Visscher, P. M. Prediction of individual genetic risk to disease from genome-wide association studies. Genome Res. 17, 1520–1528 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Purcell, S. M. et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).

    Article  CAS  PubMed  Google Scholar 

  4. Rietveld, C. A. et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science 340, 1467–1471 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Lee, J. J. et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet. 50, 1112–1121 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Cesarini, D. & Visscher, P. M. Genetics and educational attainment. npj Sci. Learn. 2, 4 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Wray, N. R., Kemper, K. E., Hayes, B. J., Goddard, M. E. & Visscher, P. M. Complex trait prediction from genome data: contrasting EBV in livestock to PRS in humans. Genetics 211, 1131–1141 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Green, E. D. & Guyer, M. S. Charting a course for genomic medicine from base pairs to bedside. Nature 470, 204–213 (2011).

    Article  CAS  PubMed  Google Scholar 

  9. Wray, N. R. et al. Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet. 14, 507–515 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Belsky, D. W. & Harden, K. P. Phenotypic annotation: using polygenic scores to translate discoveries from genome-wide association studies from the top down. Curr. Dir. Psychol. Sci. 28, 82–90 (2019).

    Article  Google Scholar 

  12. Benjamin, D. J. et al. The promises and pitfalls of genoeconomics. Annu. Rev. Econ. 4, 627–662 (2012).

    Article  Google Scholar 

  13. Freese, J. The arrival of social science genomics. Contemp. Sociol. A J. Rev. 47, 524–536 (2018).

    Article  Google Scholar 

  14. Belsky, D. W. et al. The genetics of success: how single-nucleotide polymorphisms associated with educational attainment relate to life-course development. Psychol. Sci. 27, 957–972 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Harden, K. P. et al. Genetic associations with mathematics tracking and persistence in secondary school. npj Sci. Learn. 5, 1 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Robinson, M. R. et al. Genetic evidence of assortative mating in humans. Nat. Hum. Behav. https://doi.org/10.1038/s41562-016-0016 (2017).

  17. Yengo, L. et al. Imprint of assortative mating on the human genome. Nat. Hum. Behav. 2, 948–954 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Abdellaoui, A. et al. Genetic correlates of social stratification in Great Britain. Nat. Hum. Behav. 3, 1332–1342 (2019).

    Article  PubMed  Google Scholar 

  19. Domingue, B. W., Rehkopf, D. H., Conley, D. & Boardman, J. D. Geographic clustering of polygenic scores at different stages of the life course. RSF Russell Sage Found. J. Soc. Sci. 4, 137–149 (2018).

    Google Scholar 

  20. Papageorge, N. W. & Thom, K. Genes, education, and labor market outcomes: evidence from the Health and Retirement Study. J. Eur. Econ. Assoc. 18, 1351–1399 (2020).

    Article  PubMed  Google Scholar 

  21. Rietveld, C. A. et al. Replicability and robustness of genome-wide-association studies for behavioral traits. Psychol. Sci. 25, 1975–1986 (2014).

    Article  PubMed  Google Scholar 

  22. Hewitt, J. K. Editorial policy on candidate gene association and candidate gene-by-environment interaction studies of complex traits. Behav. Genet. 42, 1–2 (2012).

    Article  PubMed  Google Scholar 

  23. Duncan, L. & Keller, M. A critical review of the first 10 years of candidate gene-by-environment interaction research in psychiatry. Am. J. Psychiatry 168, 1041 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Beauchamp, J. P. Genetic evidence for natural selection in humans in the contemporary United States. Proc. Natl Acad. Sci. USA 113, 7774–7779 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Kong, A. et al. Selection against variants in the genome associated with educational attainment. Proc. Natl Acad. Sci. USA 114, E727–E732 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Tucker-Drob, E. M. Measurement error correction of genome-wide polygenic scores in prediction samples. Preprint at bioRxiv https://doi.org/10.1101/165472 (2017).

  27. DiPrete, T. A., Burik, C. A. P. & Koellinger, P. D. Genetic instrumental variable regression: explaining socioeconomic and health outcomes in nonexperimental data. Proc. Natl Acad. Sci. USA 115, E4970–E4979 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Health and Retirement Study. Polygenic Score Data (PGS). Genetic Data Products. https://hrs.isr.umich.edu/data-products/genetic-data/products#pgs (Univ. Michigan, 2020).

  29. Lambert, S. A. et al. The Polygenic Score Catalog: an open database for reproducibility and systematic evaluation. Nat. Genet. 53, 420–425 (2021).

    Article  CAS  PubMed  Google Scholar 

  30. Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Vilhjálmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Ge, T., Chen, C.-Y., Ni, Y., Feng, Y.-C. A. & Smoller, J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun. 10, 1776 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Spearman, C. The proof and measurement of association between two things. Am. J. Psychol. 15, 72–101 (1904).

    Article  Google Scholar 

  34. Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Loh, P. R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385–1392 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Rosner, B., Spiegelman, D. & Willet, W. C. Correction of logistic regression relative risk estimates and confidence intervals for random within-person measurement error. Am. J. Epidemiol. 136, 1400–1403 (1992).

    Article  CAS  PubMed  Google Scholar 

  37. Hughes, M. Regression dilution in the proportional hazards model. Biometrics 49, 1056–1066 (1993).

    Article  CAS  PubMed  Google Scholar 

  38. Okbay, A. et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533, 539–542 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Stergiakouli, E. et al. Association between polygenic risk scores for attention-deficit hyperactivity disorder and educational and cognitive outcomes in the general population. Int. J. Epidemiol. 46, dyw216 (2016).

    Article  Google Scholar 

  40. Elliott, M. L. et al. A polygenic score for higher educational attainment is associated with larger brains. Cereb. Cortex 29, 3496–3504 (2018).

    Article  PubMed Central  Google Scholar 

  41. Martin, A. R. et al. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100, 635–649 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Day, F. R. et al. Shared genetic aetiology of puberty timing between sexes and with health-related outcomes. Nat. Commun. https://doi.org/10.1038/ncomms9842 (2015).

  44. Lo, M.-T. et al. Genome-wide analyses for personality traits identify six genomic loci and show correlations with psychiatric disorders. Nat. Genet. 49, 152–156 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Sanchez-Roige, S. et al. Genome-wide association study of alcohol use disorder identification test (AUDIT) scores in 20 328 research participants of European ancestry. Addict. Biol. 24, 121–131 (2019).

    Article  PubMed  Google Scholar 

  46. Sanchez-Roige, S. et al. Genome-wide association study of delay discounting in 23,217 adult research participants of European ancestry. Nat. Neurosci. 21, 16–20 (2018).

    Article  CAS  PubMed  Google Scholar 

  47. Warrier, V. et al. Genome-wide analyses of self-reported empathy: correlations with autism, schizophrenia, and anorexia nervosa. Transl. Psychiatry 8, 1–10 (2018).

    Article  CAS  Google Scholar 

  48. Hu, Y. et al. GWAS of 89,283 individuals identifies genetic variants associated with self-reporting of being a morning person. Nat. Commun. 7, 1–9 (2016).

    Article  CAS  Google Scholar 

  49. Hinds, D. A. et al. A genome-wide association meta-analysis of self-reported allergy identifies shared and allergy-specific susceptibility loci. Nat. Genet. https://doi.org/10.1038/ng.2686 (2013).

  50. Ferreira, M. A. et al. Shared genetic origin of asthma, hay fever and eczema elucidates allergic disease biology. Nat. Genet. 49, 1752–1757 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Demontis, D. et al. Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nat. Genet. 51, 63–75 (2019).

    Article  CAS  PubMed  Google Scholar 

  52. Pasman, J. A. et al. GWAS of lifetime cannabis use reveals new risk loci, genetic overlap with psychiatric traits, and a causal influence of schizophrenia. Nat. Neurosci. https://doi.org/10.1038/s41593-018-0206-1 (2018).

  53. Liu, M. et al. Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use. Nat. Genet. 51, 237–244 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Hyde, C. L. et al. Identification of 15 genetic loci associated with risk of major depression in individuals of European descent. Nat. Genet. 48, 1031–1036 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Pickrell, J. K. et al. Detection and interpretation of shared genetic influences on 42 human traits. Nat. Genet. 48, 709–717 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Karlsson Linnér, R. et al. Genome-wide association analyses of risk tolerance and risky behaviors in over 1 million individuals identify hundreds of loci and shared genetic influences. Nat. Genet. 51, 245–257 (2019).

    Article  PubMed  Google Scholar 

  57. Winkler, T. W. et al. Quality control and conduct of genome-wide association meta-analyses. Nat. Protoc. 9, 1192–1212 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  58. McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Abecasis, G. R. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

    Article  PubMed  Google Scholar 

  60. Bulik-Sullivan, B. K. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Perry, J. R. B. et al. Parent-of-origin-specific allelic associations among 106 genomic loci for age at menarche. Nature 514, 92–97 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Stringer, S. et al. Genome-wide association study of lifetime cannabis use based on a large meta-analytic sample of 32 330 subjects from the International Cannabis Consortium. Transl. Psychiatry 6, e769 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Furberg, H. et al. Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat. Genet. 42, 441–447 (2010).

    Article  CAS  Google Scholar 

  64. Wray, N. R. et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 50, 668–681 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Doherty, A. et al. GWAS identifies 14 loci for device-measured physical activity and sleep duration. Nat. Commun. 9, 1–8 (2018).

    Article  CAS  Google Scholar 

  66. van den Berg, S. M. et al. Meta-analysis of genome-wide association studies for extraversion: findings from the Genetics of Personality Consortium. Behav. Genet. 46, 170–182 (2016).

    Article  PubMed  Google Scholar 

  67. de Moor, M. H. M. et al. Meta-analysis of genome-wide association studies for neuroticism, and the polygenic association with major depressive disorder. JAMA Psychiatry 72, 642–650 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  68. de Moor, M. H. M. et al. Meta-analysis of genome-wide association studies for personality. Mol. Psychiatry 17, 337–349 (2012).

    Article  PubMed  Google Scholar 

  69. Okbay, A. et al. Genetic variants associated with subjective well-being, depressive symptoms, and neuroticism identified through genome-wide analyses. Nat. Genet. 48, 624–633 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Locke, A. E. A. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Kunkle, B. W. et al. Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing. Nat. Genet. 51, 414–430 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Trampush, J. W. et al. GWAS meta-analysis reveals novel loci and genetic correlates for general cognitive function: a report from the COGENT consortium. Mol. Psychiatry 22, 336–345 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Barban, N. et al. Genome-wide analysis identifies 12 loci influencing human reproductive behavior. Nat. Genet. 48, 1462–1472 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Turley, P. et al. Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat. Genet. 50, 229–237 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Daetwyler, H. D., Villanueva, B. & Woolliams, J. A. Accuracy of predicting the genetic risk of disease using a genome-wide approach. PLoS ONE 3, e3395 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  77. Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  PubMed  Google Scholar 

  78. Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. The International HapMap 3 Consortium. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).

    Article  PubMed Central  Google Scholar 

  80. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 1–16 (2015).

    Article  Google Scholar 

  81. Yengo, L. et al. Meta-analysis of genome-wide association studies for height and body mass index in 700000 individuals of European ancestry. Hum. Mol. Genet. 27, 3641–3649 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Savage, J. E. et al. Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence. Nat. Genet. 50, 912–919 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Howard, D. M. et al. Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. Nat. Neurosci. 22, 343–352 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Jones, S. E. et al. Genome-wide association analyses of chronotype in 697,828 individuals provides insights into circadian rhythms. Nat. Commun. 10, 1–11 (2019).

    Article  Google Scholar 

  85. Nagel, M. et al. Meta-analysis of genome-wide association studies for neuroticism in 449,484 individuals identifies novel genetic loci and pathways. Nat. Genet. 50, 920–927 (2018).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors thank C. Shulman for helpful comments. This research was carried out under the auspices of the SSGAC. This research was conducted using the UKB resource under application number 11,425. J.B. was supported by the Pershing Square Fund of the Foundations of Human Behavior, awarded to D.L.; H.J., M.B., D.C. and P.T. by the Ragnar Söderberg Foundation (E42/15), to D.C.; C.A.P.B., P.K. and A.O. by an ERC Consolidator Grant (647648 EdGe), to P.K.; H.J., M.B., A.Y., J.P.B., M.N.M., D.C., D.J.B. and P.T. by Open Philanthropy (010623-00001), to D.J.B.; C.A.P.B., R.A. and S.O. by Riksbankens Jubileumsfond (P18-0782:1), to S.O.; C.A.P.B. and S.O. by the Swedish Research Council (2019-00244), to S.O.; G.G., N.W. and D.J.B. by the NIA/NIH (R24-AG065184 and R01-AG042568), to D.J.B.; D.J.B. by the NIA/NIH (R56-AG058726), to T. Galama; T.T.M., K.P.H., and E.M.T.-D. by the NIH/NICHD R01-HD083613 to E.M.T.-D. and R01-HD092548 to K.P.H.; P.T. by the NIA/NIMH (R01-MH101244-02 and U01-MH109539-02), to B. Neale. The study was also supported by the NIA/NIH (K99-AG062787-01, P.T.); Netherlands Organisation for Scientific Research VENI (016.Veni.198.058, A.O.); the Swedish Research Council (421-2013-1061, M.J.); the Government of Canada through Genome Canada and the Ontario Genomics Institute (OGI-152, J.P.B.); the Social Sciences and Humanities Research Council of Canada (J.P.B.); the European Union (MP1GI18418R, T.E.); the Estonian Research Council (PRG1291, T.E.); the National Health and Medical Research Council (GNT113400, P.M.V.); and the Australian Research Council (P.M.V.). The authors thank the following consortia for sharing GWAS summary statistics: Reproductive Genetics (ReproGen) Consortium for age at first menses; Genetics of Personality Consortium (GPC) for neuroticism, extraversion and openness; Psychiatric Genomics Consortium (PGC) for ADHD and depressive symptoms; Tobacco and Alcohol Genetics (TAG) Consortium for cigarettes per day and ever smoker; International Genomics of Alzheimer’s Project (IGAP) for Alzheimer’s disease; GWAS & Sequencing Consortium of Alcohol and Nicotine Use (GSCAN) for cigarettes per day, ever smoker and drinks per week; Genetic Investigation of Anthropometric Traits (GIANT) Consortium for height and body mass index; and Cognitive Genomics (COGENT) Consortium for cognitive performance. The authors thank the Neale Lab for making UKB GWAS results available for asthma, cannabis use, COPD, hayfever, life satisfaction (family, finance, friend and work), loneliness, migraine, nearsightedness, number ever born (men, women), religious attendance, self-rated health and subjective well-being. The authors thank the research participants and employees of 23andMe for making this work possible. A full list of acknowledgements is provided in the Supplementary Note.

Author information

Authors and Affiliations

Authors

Consortia

Contributions

D.J.B., D.C., A.O., and P.T. designed and oversaw the study. A.O. supervised all analyses and led the writing of the manuscript. J.B. was the lead analyst, responsible for the GWAS and MTAG analyses, quality control of GWAS summary statistics and the PGI validation analyses. C.A.P.B. was responsible for quality control of genotype data and the construction of PGIs. G.G., N.W., H.J. and M.B. assisted with analyses. G.G. conducted the illustrative application and wrote the Python code. N.W. designed and implemented the algorithm used to generate Fig. 1. R.K.L. ran a meta-analysis of general risk tolerance omitting validation datasets. P.T. derived the measurement-error-correction estimator. A.K., D.A.H. and the 23andMe Research Group conducted genome-wide association analyses for 23andMe. The following authors shared genotype data to enable dataset participation in the Repository: K.M.H. for Add Health; D.W.B., A.C., D.L.C., T.E.M., R.P., K.S. and B.S.W. for Dunedin and E-Risk; A.S. and O.A. for ELSA; L.M. and T.E. for ECGUT; W.G.I. and M.M. for MCTFR; R.A. and P.K.E.M. for STR; T.T.M., K.P.H. and E.M.T.-D. for TTP; and P.H. and J.F. for WLS. More details about dataset-level contributions are given in the Supplementary Note. D.W.B. conducted PGI validations analyses in Dunedin and E-Risk. R.A., A.Y., J.P.B., P.K., S.O., M.J., P.M.V., M.N.M., and D.L. contributed to study design. All authors contributed to and critically reviewed the manuscript. D.J.B., A.O., D.C. and P.T. made especially major contributions to the writing and editing.

Corresponding authors

Correspondence to David Cesarini, Daniel J. Benjamin, Patrick Turley or Aysu Okbay.

Ethics declarations

Competing interests

D.A.H., A.K. and members of the 23andMe Research Team are current or former employees of 23andMe, Inc. and hold stock or stock options in 23andMe. The authors declare no other competing interests.

Additional information

Peer review information Nature Human Behaviour thanks the anonymous reviewers for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Methods, Supplementary Note and Supplementary Fig. 1.

Reporting summary

Peer review information

Supplementary data 1

Source data for Becker et al. Supplementary Fig. 1 (predictive power of repository multi-trait PGIs).

Supplementary Table 1

Supplementary Tables 1–13 for Becker et al.

Source data

Source Data Fig. 1

Statistical source data.

Source Data Fig. 3

Statistical source data.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Becker, J., Burik, C.A.P., Goldman, G. et al. Resource profile and user guide of the Polygenic Index Repository. Nat Hum Behav 5, 1744–1758 (2021). https://doi.org/10.1038/s41562-021-01119-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41562-021-01119-3

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing