Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Minimal phenotyping yields genome-wide association signals of low specificity for major depression

Abstract

Minimal phenotyping refers to the reliance on the use of a small number of self-reported items for disease case identification, increasingly used in genome-wide association studies (GWAS). Here we report differences in genetic architecture between depression defined by minimal phenotyping and strictly defined major depressive disorder (MDD): the former has a lower genotype-derived heritability that cannot be explained by inclusion of milder cases and a higher proportion of the genome contributing to this shared genetic liability with other conditions than for strictly defined MDD. GWAS based on minimal phenotyping definitions preferentially identifies loci that are not specific to MDD, and, although it generates highly predictive polygenic risk scores, the predictive power can be explained entirely by large sample sizes rather than by specificity for MDD. Our results show that reliance on results from minimal phenotyping may bias views of the genetic architecture of MDD and impede the ability to identify pathways specific to MDD.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Definitions of depression in UK Biobank.
Fig. 2: Relationship between definitions of depression and environmental risk factors.
Fig. 3: SNP heritability and genetic correlation estimates among definitions of MDD in UK Biobank.
Fig. 4: Genetic correlation between definitions of MDD and other psychiatric conditions.
Fig. 5: Tissue-specific gene expression enrichment in definitions of MDD.
Fig. 6: GWAS hits from minimal phenotyping definition of MDD in the UK Biobank are not specific to MDD.
Fig. 7: Out-of-sample prediction of MDD in PGC cohorts.

Similar content being viewed by others

Data availability

Genotype and phenotype data used in this study are from the full release (imputation version 2) of the UK Biobank resource obtained under application no. 28709. We used publicly available summary statistics from other studies downloadable from the website of the Psychiatric Genomics Consortium (https://www.med.unc.edu/pgc/results-and-downloads), the references for which can be found in Supplementary Table 1. We also referenced the 2011 Census aggregate data from the UK Data Service (https://doi.org/10.5257/census/aggregate-2011-2).

References

  1. Lu, J. T., Campeau, P. M. & Lee, B. H. Genotype–phenotype correlation: promiscuity in the era of next-generation sequencing. Obstet. Gynecol. Surv. 69, 728–730 (2014).

  2. Ripke, S. et al. Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nat. Genet. 45, 1150–1159 (2013).

    PubMed  PubMed Central  CAS  Google Scholar 

  3. Howard, D. M. et al. Genome-wide association study of depression phenotypes in UK Biobank identifies variants in excitatory synaptic pathways. Nat. Commun. 9, 1470 (2018).

    PubMed  PubMed Central  Google Scholar 

  4. Hyde, C. L. et al. Identification of 15 genetic loci associated with risk of major depression in individuals of European descent. Nat. Genet. 48, 1031–1036 (2016).

    PubMed  PubMed Central  CAS  Google Scholar 

  5. Wray, N. R. et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 50, 668–681 (2018).

    PubMed  PubMed Central  CAS  Google Scholar 

  6. Flint, J. & Kendler, K. S. The genetics of major depression. Neuron 81, 484–503 (2014).

    PubMed  PubMed Central  CAS  Google Scholar 

  7. Kessler, R. C. et al. The epidemiology of major depressive disorder: results from the National Comorbidity Survey Replication (NCS-R). JAMA 289, 3095–3105 (2003).

    PubMed  Google Scholar 

  8. Boyd, J. H., Weissman, M. M., Thompson, W. D. & Myers, J. K. Screening for depression in a community sample. Understanding the discrepancies between depression symptom and diagnostic scales. Arch. Gen. Psychiatry. 39, 1195–1200 (1982).

    PubMed  CAS  Google Scholar 

  9. Breslau, N. Depressive symptoms, major depression, and generalized anxiety: a comparison of self-reports on CES-D and results from diagnostic interviews. Psychiatry Res. 15, 219–229 (1985).

    PubMed  CAS  Google Scholar 

  10. Weissman, M. M. & Myers, J. K. Rates and risks of depressive symptoms in a United States urban community. Acta Psychiatr. Scand. 57, 219–231 (1978).

    PubMed  CAS  Google Scholar 

  11. Mitchell, A. J., Vaze, A. & Rao, S. Clinical diagnosis of depression in primary care: a meta-analysis. Lancet 374, 609–619 (2009).

    PubMed  Google Scholar 

  12. Mojtabai, R. Clinician-identified depression in community settings: concordance with structured-interview diagnoses. Psychother. Psychosom. 82, 161–169 (2013).

    PubMed  Google Scholar 

  13. Druss, B. G. et al. Understanding mental health treatment in persons without mental diagnoses: results from the National Comorbidity Survey Replication. Arch. Gen. Psychiatry 64, 1196–1203 (2007).

    PubMed  PubMed Central  Google Scholar 

  14. Marcus, S. C. & Olfson, M. National trends in the treatment for depression from 1998 to 2007. Arch. Gen. Psychiatry 67, 1265–1273 (2010).

    PubMed  Google Scholar 

  15. Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

    PubMed  PubMed Central  Google Scholar 

  16. Smith, D. J. et al. Prevalence and characteristics of probable major depression and bipolar disorder within UK Biobank: cross-sectional study of 172,751 participants. PLoS One 8, e75362 (2013).

    PubMed  PubMed Central  Google Scholar 

  17. Davis, K. A. S. et al. Mental health in UK Biobank: development, implementation and results from an online questionnaire completed by 157 366 participants. BJPsych Open 4, 83–90 (2018).

    PubMed  PubMed Central  Google Scholar 

  18. Kessler, R. C. & Ustun, T. B. The World Mental Health (WMH) Survey initiative version of the World Health Organization (WHO) Composite International Diagnostic Interview (CIDI). Int. J. Meth. Psych. Res. 13, 93–121 (2004).

    Google Scholar 

  19. Bromet, E. J., Dunn, L. O., Connell, M. M., Dew, M. A. & Schulberg, H. C. Long-term reliability of diagnosing lifetime major depression in a community sample. Arch. Gen. Psychiatry 43, 435–440 (1986).

    PubMed  CAS  Google Scholar 

  20. Kendler, K. S., Neale, M. C., Kessler, R. C., Heath, A. C. & Eaves, L. J. The lifetime history of major depression in women. Reliability of diagnosis and heritability. Arch. Gen. Psychiatry 50, 863–870 (1993).

    PubMed  CAS  Google Scholar 

  21. Rice, J. P., Rochberg, N., Endicott, J., Lavori, P. W. & Miller, C. Stability of psychiatric diagnoses. An application to the affective disorders. Arch. Gen. Psychiatry 49, 824–830 (1992).

    PubMed  CAS  Google Scholar 

  22. Foley, D. L., Neale, M. C. & Kendler, K. S. Genetic and environmental risk factors for depression assessed by subject-rated symptom check list versus structured clinical interview. Psychol. Med. 31, 1413–1423 (2001).

    PubMed  CAS  Google Scholar 

  23. Kendler, K. S., Gardner, C. O., Neale, M. C. & Prescott, C. A. Genetic risk factors for major depression in men and women: similar or different heritabilities and same or partly distinct genes? Psychol. Med. 31, 605–616 (2001).

    PubMed  CAS  Google Scholar 

  24. Kendler, K. S., Gatz, M., Gardner, C. O. & Pedersen, N. L. Personality and major depression: a Swedish longitudinal, population-based twin study. Arch. Gen. Psychiatry 63, 1113–1120 (2006).

    PubMed  Google Scholar 

  25. Alexopoulos, G. S. et al. ‘Vascular depression’ hypothesis. Arch. Gen. Psychiatry 54, 915–922 (1997).

    PubMed  CAS  Google Scholar 

  26. Kessler, R. C. et al. Lifetime prevalence and age-of-onset distributions of DSM-IV disorders in the National Comorbidity Survey Replication. Arch. Gen. Psychiatry 62, 593–602 (2005).

    PubMed  Google Scholar 

  27. Kessler, R. C., Foster, C. L., Saunders, W. B. & Stang, P. E. Social consequences of psychiatric disorders. I: Educational attainment. Am. J. Psychiatry 152, 1026–1032 (1995).

    PubMed  CAS  Google Scholar 

  28. Lorant, V. et al. Socioeconomic inequalities in depression: a meta-analysis. Am. J. Epidemiol. 157, 98–112 (2003).

    PubMed  CAS  Google Scholar 

  29. Kessler, R. C. Epidemiology of women and depression. J. Affect. Disord. 74, 5–13 (2003).

    PubMed  Google Scholar 

  30. Kendler, K. S., Neale, M. C., Kessler, R. C., Heath, A. C. & Eaves, L. J. A longitudinal twin study of personality and major depression in women. Arch. Gen. Psychiatry 50, 853–862 (1993).

    PubMed  CAS  Google Scholar 

  31. Kessler, R. C. The effects of stressful life events on depression. Ann. Rev. Psychol. 48, 191–214 (1997).

    CAS  Google Scholar 

  32. Mazure, C. M. Life stressors as risk factors in depression. Clinical Psychology: Science and Practice 5, 291–313 (1998).

    Google Scholar 

  33. Weissbrod, O., Flint, J. & Rosset, S. Estimating SNP-based heritability and genetic correlation in case–control studies directly and with summary statistics. Am. J. Hum. Genet. 103, 89–99 (2018).

    PubMed  PubMed Central  CAS  Google Scholar 

  34. Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    PubMed  PubMed Central  CAS  Google Scholar 

  35. Loh, P.-R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385–1392 (2015).

    PubMed  PubMed Central  CAS  Google Scholar 

  36. Shi, H., Kichaev, G. & Pasaniuc, B. Contrasting the genetic architecture of 30 complex traits from summary association data. Am. J. Hum. Genet. 99, 139–153 (2016).

    PubMed  PubMed Central  CAS  Google Scholar 

  37. Price, A. L. et al. Long-range LD can confound genome scans in admixed populations. Am. J. Hum. Genet. 83, 132–135 (2008).

    PubMed  PubMed Central  CAS  Google Scholar 

  38. CONVERGE consortium. Sparse whole-genome sequencing identifies two loci for major depressive disorder. Nature 523, 588–591 (2015).

    PubMed Central  Google Scholar 

  39. Major Depressive Disorder Working Group of the Psychiatric GWAS Consortiumet al. A mega-analysis of genome-wide association studies for major depressive disorder. Mol. Psychiatry 18, 497–511 (2013).

    Google Scholar 

  40. Peterson, R. E. et al. Molecular genetic analysis subdivided by adversity exposure suggests etiologic heterogeneity in major depression. Am. J. Psychiatry 175, 545–554 (2018).

    PubMed  PubMed Central  Google Scholar 

  41. Northern Ireland Statistics and Research Agency: 2011 Census aggregate data. UK Data Service https://doi.org/10.5257/census/aggregate-2011-1 (2016).

  42. Dempster, E. R. & Lerner, I. M. Heritability of threshold characters. Genetics 35, 212–236 (1950).

    PubMed  PubMed Central  CAS  Google Scholar 

  43. Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).

    PubMed  PubMed Central  CAS  Google Scholar 

  44. Cross-Disorder Group of the Psychiatric Genomics Consortium. Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet 381, 1371–1379 (2013).

    PubMed Central  Google Scholar 

  45. Brainstorm Consortium et al. Analysis of shared heritability in common disorders of the brain. Science 360, eaap8757 (2018).

    Google Scholar 

  46. Shi, H., Mancuso, N., Spendlove, S. & Pasaniuc, B. Local genetic correlation gives insights into the shared genetic architecture of complex traits. Am. J. Hum. Genet. 101, 737–751 (2017).

    PubMed  PubMed Central  CAS  Google Scholar 

  47. GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).

    Google Scholar 

  48. Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).

    PubMed  PubMed Central  CAS  Google Scholar 

  49. Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).

    PubMed Central  Google Scholar 

  50. Psychiatric Genomics Consortium Bipolar Disorder Working Group. Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4. Nat. Genet. 43, 977–998 (2011).

    Google Scholar 

  51. Trzaskowski, M. et al. Quantifying between-cohort and between-sex genetic heterogeneity in major depressive disorder. Am. J. Med. Genet. B Neuropsychiatr. Genet. 180, 439–447 (2019).

    PubMed  PubMed Central  Google Scholar 

  52. Turley, P. et al. Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat. Genet. 50, 229–237 (2018).

    PubMed  PubMed Central  CAS  Google Scholar 

  53. Corfield, E. C., Yang, Y., Martin, N. G. & Nyholt, D. R. A continuum of genetic liability for minor and major depression. Transl. Psychiatry 7, e1131 (2017).

    PubMed  PubMed Central  CAS  Google Scholar 

  54. Direk, N. et al. An analysis of two genome-wide association meta-analyses identifies a new locus for broad depression phenotype. Biol. Psychiatry 82, 322–329 (2017).

    PubMed  CAS  Google Scholar 

  55. Kendler, K. S. et al. Shared and specific genetic risk factors for lifetime major depression, depressive symptoms and neuroticism in three population-based twin samples. Psychol. Med. 49, 2745–2753 (2018).

    PubMed  PubMed Central  Google Scholar 

  56. Okbay, A. et al. Genetic variants associated with subjective well-being, depressive symptoms, and neuroticism identified through genome-wide analyses. Nat. Genet. 48, 624–633 (2016).

    PubMed  PubMed Central  CAS  Google Scholar 

  57. McIntosh, A. M., Sullivan, P. F. & Lewis, C. M. Uncovering the genetic architecture of major depression. Neuron 102, 91–103 (2019).

    PubMed  PubMed Central  CAS  Google Scholar 

  58. Kendler, K. S. & Karkowski-Shuman, L. Stressful life events and genetic liability to major depression: genetic control of exposure to the environment? Psychol. Med. 27, 539–547 (1997).

    PubMed  CAS  Google Scholar 

  59. Fluharty, M., Taylor, A. E., Grabski, M. & Munafo, M. R. The association of cigarette smoking with depression and anxiety: a systematic review. Nicotine Tob. Res. 19, 3–13 (2017).

    PubMed  Google Scholar 

  60. Wootton, R. E. et al. Evidence for causal effects of lifetime smoking on risk for depression and schizophrenia: a Mendelian randomisation study. Psychol. Med. 6, 1–9 (2019).

  61. Gibbons, R. D. et al. The computerized adaptive diagnostic test for major depressive disorder (CAD-MDD): a screening tool for depression. J. Clin. Psychiatry 74, 669–674 (2013).

    PubMed  PubMed Central  Google Scholar 

  62. Freimer, N. B. & Mohr, D. C. Integrating behavioural health tracking in human genetics research. Nat. Rev. Genet. 20, 129–130 (2019).

    PubMed  CAS  PubMed Central  Google Scholar 

  63. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    PubMed  PubMed Central  CAS  Google Scholar 

  64. Berisa, T. & Pickrell, J. K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283–285 (2016).

    PubMed  CAS  Google Scholar 

  65. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).

    PubMed  PubMed Central  Google Scholar 

  66. Golan, D., Lander, E. S. & Rosset, S. Measuring missing heritability: inferring the contribution of common variants. Proc. Natl Acad. Sci. USA 111, E5272–E5281 (2014).

    PubMed  CAS  PubMed Central  Google Scholar 

  67. Abraham, G. & Inouye, M. Fast principal component analysis of large-scale genome-wide data. PLoS One 9, e93766 (2014).

    PubMed  PubMed Central  Google Scholar 

  68. 1000 Genomes Project Consortiumet al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Google Scholar 

  69. Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    PubMed  PubMed Central  CAS  Google Scholar 

  70. Lam, M. et al. RICOPILI: Rapid Imputation for COnsortias PIpeLIne. Bioinformatics https://doi.org/10.1093/bioinformatics/btz633 (2019).

  71. Berardi, D. et al. Increased recognition of depression in primary care. Comparison between primary-care physician and ICD-10 diagnosis of depression. Psychother. Psychosom. 74, 225–230 (2005).

    PubMed  Google Scholar 

  72. Fry, A. et al. Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am. J. Epidemiol. 186, 1026–1034 (2017).

    PubMed  PubMed Central  Google Scholar 

  73. Adams, M. J. et al. Factors associated with sharing email information and mental health survey participation in large population cohorts. Int. J. Epidemiol. https://doi.org/10.1101/471433 (2019).

  74. Mullins, N. & Lewis, C. M. Genetics of depression: progress at last. Curr. Psychiatry Rep. 19, 43 (2017).

    PubMed  PubMed Central  Google Scholar 

  75. Sullivan, P. F. et al. Psychiatric genomics: an update and an agenda. Am. J. Psychiatry 175, 15–27 (2018).

    PubMed  Google Scholar 

  76. Coyne, J. C., Schwenk, T. L. & Smolinski, M. Recognizing depression: a comparison of family physician ratings, self-report, and interview measures. J. Am. Board Fam. Pract. 4, 207–215 (1991).

    PubMed  CAS  Google Scholar 

  77. Nevin, R. L. Low validity of self-report in identifying recent mental health diagnosis among U.S. service members completing Pre-Deployment Health Assessment (PreDHA) and deployed to Afghanistan, 2007: a retrospective cohort study. BMC Public Health 9, 376 (2009).

    PubMed  PubMed Central  Google Scholar 

  78. Clarke, D. E. et al. DSM-5 field trials in the United States and Canada. Part I: study design, sampling strategy, implementation, and analytic approaches. Am. J. Psychiatry 170, 43–58 (2013).

    PubMed  Google Scholar 

  79. Spitzer, R. L., Forman, J. B. & Nee, J. DSM-III field trials. I. Initial interrater diagnostic reliability. Am. J. Psychiatry 136, 815–817 (1979).

    PubMed  CAS  Google Scholar 

  80. Keller, M. B. et al. Results of the DSM-IV mood disorders field trial. Am. J. Psychiatry 152, 843–849 (1995).

    PubMed  CAS  Google Scholar 

  81. McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).

    PubMed  PubMed Central  CAS  Google Scholar 

  82. Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).

    PubMed  PubMed Central  CAS  Google Scholar 

Download references

Acknowledgements

We thank O. Weissbrod, A. Dahl, H. Shi and V. Zuber for insightful discussions. N.C. is supported by the ESPOD Fellowship from European Bioinformatics (EMBL-EBI) and Wellcome Sanger Institute. A.V. is supported by the Swedish Brain Foundation. C.M.L. and G.B. are funded by the National Institute for Health Research (NIHR) Maudsley Biomedical Research Centre at South London Maudsley Foundation Trust and King’s College London. In the last 3 years, M.M.W. has received research funds from the US National Institute of Mental Health (NIMH), the Templeton Foundation and the Sackler Foundation and has received royalties for publication of books on interpersonal psychotherapy from Perseus Press and Oxford University Press, on other topics from the American Psychiatric Association Press and royalties on the social adjustment scale from Multihealth Systems. The CoLaus|PsyCoLaus study was and is supported by research grants from GlaxoSmithKline, the Faculty of Biology and Medicine of Lausanne and the Swiss National Science Foundation (grants 3200B0-105993, 3200B0-118308, 33CSCO-122661, 33CS30-139468, 33CS30-148401 and 33CS30-177535/1). The PGC has received major funding from the US NIMH and the US National Institute of Drug Abuse (U01 MH109528 and U01 MH1095320). This research was conducted using the UK Biobank resource under application no. 28709 and with the support and collaboration from all investigators who make up the MDD Working Group of the PGC (full list in the Supplementary Note). We are greatly indebted to the hundreds of thousands of individuals who have shared their life experiences with the UK Biobank and PGC investigators.

Author information

Authors and Affiliations

Authors

Consortia

Contributions

N.C. and J.F. designed the study. N.C. and J.A.R. performed the analyses. N.C. and J.F. obtained the data from the UK Biobank resource. M.J.A., T.F.M.A., G.B., E.M.B., T.-K.C., A.J.F., H.J.G., S.P.H., D.F.L., C.M.L., G.L., N.G.M., Y.M., O.M., B.M.-M., B.W.J.H.P., R.H.P., G.P., J.B.P., M.P., J.S., J.W.S., F.S., H.T., R.U., S.V.d.A., A.V., M.M.W. and all investigators from the MDD Working Group of the PGC contributed data from the PGC. N.C., K.S.K. and J.F. interpreted the results and wrote the manuscript.

Corresponding author

Correspondence to Na Cai.

Ethics declarations

Competing interests

C.M.L. is on the scientific advisory board of Myriad Neuroscience. H.J.G. has received travel grants and speaker’s honoraria from Fresenius Medical Care, Neuraxpharm and Janssen Cilag as well as research funding from Fresenius Medical Care. B.W.J.H.P. has received (non-related) research grants from Jansen Research and Boehringer Ingelheim.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Simulations of misdiagnosis and misclassification.

a-c, Each boxplot show h2SNP estimates from 10 simulated phenotypes, with upper and lower boundaries of boxes represent the first to third quartiles of all estimates, and the whiskers extends to 1.5 times the interquartile range of the estimates. a, This figure shows that liability scale h2SNP does not change with shifting of liability threshold Ki{0.1, 0.2, 0.3, 0.4, 0.5} for simulated heritabilities \(h_i^2 \in \left\{ {0.2,0.4,0.6,0.8} \right\}\). b, The figure shows that liability scale h2SNP is deflated with increasing percentage of controls being misdiagnosed as cases, when prevalence of diagnosed cases is kept constant at Ki=0.2, for simulated heritabilities \(h_i^2 \in \left\{ {0.2,0.4,0.6,0.8} \right\}\). c, This figure shows liability scale h2SNP is deflated with increasing percentage of misclassification of cases of “other” disease as cases of focal disease, if rG between the two diseases are moderate to low, for simulated \(h_{i,1}^2 = 0.4\), for each of which all cases at prevalence Ki,1=0.2 are correctly identified as cases.

Extended Data Fig. 2 Simulations of misclassification at different heritabilities.

a-d, These figures shows the estimated h2SNP using–pcgc option with–prevalence K in LDAK, plotted on the y-axis) of binary traits (yi,1, where i {1..10}) with simulated \(h_{i,1}^2\)0.2, 0.4, 0.6, and 0.8, for each of which all cases (at prevalence Ki,1= 0.2) are correctly identified as cases, while varying numbers of cases misclassified from a genetically correlated binary trait (yi,2, where i{1..10}) of equal \(h_{i,1}^2\) and prevalence as cases of yi,1. Genetic correlations between yi,1 and yi,2 (rGi{0, 0.2, 0.4, 0.6, 0.8, 0.95}) are shown in the grey bars above each panel. Each boxplot show h2SNP estimates from 10 simulated phenotypes, with upper and lower boundaries of boxes represent the first to third quartiles of all estimates, and the whiskers extends to 1.5 times the interquartile range of the estimates.

Extended Data Fig. 3 GWAS on neuroticism and smoking in UK Biobank.

a, b, This figure shows the Manhattan plot of neuroticism score (data field 20127, quantitative trait from 0 to 12) in 274,107 individuals and ever smoked status (data field 20160, binary trait of 0 for “No”, and 1 for “Yes”) in 336,066 individuals in UK Biobank using linear regression on all 8,968,716 common SNPs (MAF > 5% in all 337,198 White-British, unrelated samples) for all the above analyses in PLINK (version 1.9)32 with 20 PCs and genotyping array as covariates. We report all associations with P-values smaller than 5 ×10−8 as genome-wide significant (red). We indicated the SNPs in SVs and the MHC in all Manhattan plots as hollow points instead of solid points due to lack of control for population structure in these regions, and show all top SNPs within peaks (1-Mb regions) in Supplementary Tables 10 and 11.

Extended Data Fig. 4 LDSC-SEG analysis of tissue-specific enrichment of h2SNP.

a, This figure shows -log10(P) of enrichment in heritability in genes specifically expressed in 44 GTEx tissues, estimated using partitioned heritability in LDSC-SEG, on LifetimeMDD (n = 67,171), PGC1-MDD (n = 18,759), PGC29 (n = 42,455) and a meta-analysis of LifetimeMDD and PGC29 (n = 109,626, PC29.LifetimeMDD, Methods). While PGC29 shows CNS enrichment, neither LifetimeMDD nor the meta-analysis shows the same enrichment. This suggests sample size and differences in genetic architecture and cohort heterogeneity affects results from LDSC-SEG. b, This figure shows the same analysis performed on down-sampled data for each definition of depression. Each definition is randomly down-sampled to 7,500 cases and 42,500 controls, a constant prevalence of 0.15, to remove confounding from sample size and difference in statistical power on the enrichment analysis. This figure shows that at equal sample size and prevalence, GPNoDep (no-MDD Help-seeking phenotype) is the only one showing CNS enrichment, suggesting it may be driving the CNS enrichment signal in GPpsy in Fig. 5.

Extended Data Fig. 5 GWAS hits from 23andMe are not specific to MDD.

This figure shows the odds ratios of risk alleles (Risk Allele ORs) at 17 loci significantly associated with help-seeking based definitions of MDD in 23andMe27, in GWAS conducted on CIDI-based (LifetimeMDD, in purple), help-seeking (GPpsy in red) and no-MDD (GPNoDep, in orange) based definitions of MDD, as well as conditions other than MDD: neuroticism, smoking and SCZ (all in brown). SNPs missing in each panel are not tested in the respective GWAS. For clarity of display, scales on different panels vary to accommodate the different magnitudes of ORs of SNPs in different conditions. ORs at all 17 loci are highly consistent across phenotypes, regardless of whether it is a definition or MDD or a risk factor or condition other than MDD. All results are shown in Supplementary Table 20. Error bars show the standard errors of the estimates.

Extended Data Fig. 6 Out-of-sample prediction in PGC cohorts.

a, This figure shows the Nagelkerke’s r2 of polygenic risk scores (PRS) calculated for each definition of depression in UK Biobank and MDD status indicated in 19 PGC29-MDD cohorts, while controlling for cohort specific effects. PRS were calculated using effect sizes at independent (LD r2 < 0.1) SNPs passing P-value thresholds 10−4, 0.001, 0.01, 0.05, 0.01, 0.2, 0.5 and 1 respectively, in GWAS performed on all definitions of depression in UK Biobank. b, This figure shows the same analysis performed on down-sampled data (7,500 cases, 42,500 controls) for each definition of depression.

Extended Data Fig. 7 Relationship between effective sample size and prediction accuracy.

a, This figure shows the relationship between the ratio of effective sample sizes between the full cohort (NFC) and down-sampled (NDS) data for each definition of depression and the ratio of their mean Chi-square (χ2) statistic from GWAS, with black line x = y for reference. Across all definitions of depression,\(\frac{{\overline {\chi ^2_{FC}} - 1}}{{\overline {\chi ^2_{DS}} - 1}}\) is highly correlated with \(\frac{{N_{FC}}}{{N_{DS}}}\) (Pearson r2 = 0.999, P = 5.50 ×10−7), and \(\frac{{N_{FC}}}{{N_{DS}}}\) has an effect of beta = 1.27 (s.e. = 0.02) on \(\frac{{\overline {\chi ^2_{FC}} - 1}}{{\overline {\chi ^2_{DS}} - 1}}\). b, This figure shows the Nagelkerke’s r2 (Nkr2) for MDD status in PGC29 cohorts predicted for PRS of different definitions of depression at NFC, plotted against their respective empirical Nkr2 at NFC, both at P-value threshold = 1. The Pearson correlation r2 between predicted and actual NKr2 across all definitions were 0.989 (P = 4.46 ×10−5). c, This figure shows for each definition of depression the effective sample size NX required for each predicted Nkr2 in out-of-sample prediction of MDD status in PGC29 cohorts. While Nx= 274,677 (indicated with orange vertical dotted line) for GPpsy to achieve a Nkr2 of 0.0172 (indicated with orange horizontal dotted line), a smaller Nx= 129,106 (indicated with pink vertical dotted line) is needed to achieve the same Nkr2 for LifetimeMDD.

Extended Data Fig. 8 Prediction accuracy in cohorts with different percentage of DSM MDD cases.

a, This figure shows the area under the curve (AUC) of polygenic risk scores (PRS) calculated for each definition of depression in UK Biobank and MDD status indicated in 20 PGC29-MDD cohorts at P-value threshold of 0.1 (using all SNPs after LD-clumping, see results at all P-value thresholds in Supplementary Table 23), plotting AUC for each cohort against their respective percentage of cases fulfilling DSM-5 criteria A for MDD (see Supplementary Table 21). It shows that strictly defined CIDI-based LifetimeMDD is the only definition of depression in UK Biobank that shows increases in AUC as percentage of cases fulfilling DSM-5 criteria A for MDD in PGC cohorts increases, despite not giving the highest AUC. b, This figure shows the same analysis removing the PGC29-MDD cohort rad3, which is the outlier giving AUC > 0.6 in GPpsy in a. As this is a UK-based cohort, it is possible it contains relatives of individuals in UK Biobank that upwardly biased prediction accuracy in it. For all analysis shown in Fig. 7, Extended Data Figs. 6 and 7 and Supplementary Table 23, we have removed this cohort.

Supplementary information

Supplementary Information

Supplementary Note, Supplementary Figs. 1–7 and Supplementary Tables 1–23

Reporting Summary

Source data

Source Data Fig. 2

Supporting statistics for Fig. 2.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cai, N., Revez, J.A., Adams, M.J. et al. Minimal phenotyping yields genome-wide association signals of low specificity for major depression. Nat Genet 52, 437–447 (2020). https://doi.org/10.1038/s41588-020-0594-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-020-0594-5

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing