Minimal phenotyping yields genome-wide association signals of low specificity for major depression


Minimal phenotyping refers to the reliance on the use of a small number of self-reported items for disease case identification, increasingly used in genome-wide association studies (GWAS). Here we report differences in genetic architecture between depression defined by minimal phenotyping and strictly defined major depressive disorder (MDD): the former has a lower genotype-derived heritability that cannot be explained by inclusion of milder cases and a higher proportion of the genome contributing to this shared genetic liability with other conditions than for strictly defined MDD. GWAS based on minimal phenotyping definitions preferentially identifies loci that are not specific to MDD, and, although it generates highly predictive polygenic risk scores, the predictive power can be explained entirely by large sample sizes rather than by specificity for MDD. Our results show that reliance on results from minimal phenotyping may bias views of the genetic architecture of MDD and impede the ability to identify pathways specific to MDD.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Definitions of depression in UK Biobank.
Fig. 2: Relationship between definitions of depression and environmental risk factors.
Fig. 3: SNP heritability and genetic correlation estimates among definitions of MDD in UK Biobank.
Fig. 4: Genetic correlation between definitions of MDD and other psychiatric conditions.
Fig. 5: Tissue-specific gene expression enrichment in definitions of MDD.
Fig. 6: GWAS hits from minimal phenotyping definition of MDD in the UK Biobank are not specific to MDD.
Fig. 7: Out-of-sample prediction of MDD in PGC cohorts.

Data availability

Genotype and phenotype data used in this study are from the full release (imputation version 2) of the UK Biobank resource obtained under application no. 28709. We used publicly available summary statistics from other studies downloadable from the website of the Psychiatric Genomics Consortium (, the references for which can be found in Supplementary Table 1. We also referenced the 2011 Census aggregate data from the UK Data Service (


  1. 1.

    Lu, J. T., Campeau, P. M. & Lee, B. H. Genotype–phenotype correlation: promiscuity in the era of next-generation sequencing. Obstet. Gynecol. Surv. 69, 728–730 (2014).

  2. 2.

    Ripke, S. et al. Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nat. Genet. 45, 1150–1159 (2013).

  3. 3.

    Howard, D. M. et al. Genome-wide association study of depression phenotypes in UK Biobank identifies variants in excitatory synaptic pathways. Nat. Commun. 9, 1470 (2018).

  4. 4.

    Hyde, C. L. et al. Identification of 15 genetic loci associated with risk of major depression in individuals of European descent. Nat. Genet. 48, 1031–1036 (2016).

  5. 5.

    Wray, N. R. et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 50, 668–681 (2018).

  6. 6.

    Flint, J. & Kendler, K. S. The genetics of major depression. Neuron 81, 484–503 (2014).

  7. 7.

    Kessler, R. C. et al. The epidemiology of major depressive disorder: results from the National Comorbidity Survey Replication (NCS-R). JAMA 289, 3095–3105 (2003).

  8. 8.

    Boyd, J. H., Weissman, M. M., Thompson, W. D. & Myers, J. K. Screening for depression in a community sample. Understanding the discrepancies between depression symptom and diagnostic scales. Arch. Gen. Psychiatry. 39, 1195–1200 (1982).

  9. 9.

    Breslau, N. Depressive symptoms, major depression, and generalized anxiety: a comparison of self-reports on CES-D and results from diagnostic interviews. Psychiatry Res. 15, 219–229 (1985).

  10. 10.

    Weissman, M. M. & Myers, J. K. Rates and risks of depressive symptoms in a United States urban community. Acta Psychiatr. Scand. 57, 219–231 (1978).

  11. 11.

    Mitchell, A. J., Vaze, A. & Rao, S. Clinical diagnosis of depression in primary care: a meta-analysis. Lancet 374, 609–619 (2009).

  12. 12.

    Mojtabai, R. Clinician-identified depression in community settings: concordance with structured-interview diagnoses. Psychother. Psychosom. 82, 161–169 (2013).

  13. 13.

    Druss, B. G. et al. Understanding mental health treatment in persons without mental diagnoses: results from the National Comorbidity Survey Replication. Arch. Gen. Psychiatry 64, 1196–1203 (2007).

  14. 14.

    Marcus, S. C. & Olfson, M. National trends in the treatment for depression from 1998 to 2007. Arch. Gen. Psychiatry 67, 1265–1273 (2010).

  15. 15.

    Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

  16. 16.

    Smith, D. J. et al. Prevalence and characteristics of probable major depression and bipolar disorder within UK Biobank: cross-sectional study of 172,751 participants. PLoS One 8, e75362 (2013).

  17. 17.

    Davis, K. A. S. et al. Mental health in UK Biobank: development, implementation and results from an online questionnaire completed by 157 366 participants. BJPsych Open 4, 83–90 (2018).

  18. 18.

    Kessler, R. C. & Ustun, T. B. The World Mental Health (WMH) Survey initiative version of the World Health Organization (WHO) Composite International Diagnostic Interview (CIDI). Int. J. Meth. Psych. Res. 13, 93–121 (2004).

  19. 19.

    Bromet, E. J., Dunn, L. O., Connell, M. M., Dew, M. A. & Schulberg, H. C. Long-term reliability of diagnosing lifetime major depression in a community sample. Arch. Gen. Psychiatry 43, 435–440 (1986).

  20. 20.

    Kendler, K. S., Neale, M. C., Kessler, R. C., Heath, A. C. & Eaves, L. J. The lifetime history of major depression in women. Reliability of diagnosis and heritability. Arch. Gen. Psychiatry 50, 863–870 (1993).

  21. 21.

    Rice, J. P., Rochberg, N., Endicott, J., Lavori, P. W. & Miller, C. Stability of psychiatric diagnoses. An application to the affective disorders. Arch. Gen. Psychiatry 49, 824–830 (1992).

  22. 22.

    Foley, D. L., Neale, M. C. & Kendler, K. S. Genetic and environmental risk factors for depression assessed by subject-rated symptom check list versus structured clinical interview. Psychol. Med. 31, 1413–1423 (2001).

  23. 23.

    Kendler, K. S., Gardner, C. O., Neale, M. C. & Prescott, C. A. Genetic risk factors for major depression in men and women: similar or different heritabilities and same or partly distinct genes? Psychol. Med. 31, 605–616 (2001).

  24. 24.

    Kendler, K. S., Gatz, M., Gardner, C. O. & Pedersen, N. L. Personality and major depression: a Swedish longitudinal, population-based twin study. Arch. Gen. Psychiatry 63, 1113–1120 (2006).

  25. 25.

    Alexopoulos, G. S. et al. ‘Vascular depression’ hypothesis. Arch. Gen. Psychiatry 54, 915–922 (1997).

  26. 26.

    Kessler, R. C. et al. Lifetime prevalence and age-of-onset distributions of DSM-IV disorders in the National Comorbidity Survey Replication. Arch. Gen. Psychiatry 62, 593–602 (2005).

  27. 27.

    Kessler, R. C., Foster, C. L., Saunders, W. B. & Stang, P. E. Social consequences of psychiatric disorders. I: Educational attainment. Am. J. Psychiatry 152, 1026–1032 (1995).

  28. 28.

    Lorant, V. et al. Socioeconomic inequalities in depression: a meta-analysis. Am. J. Epidemiol. 157, 98–112 (2003).

  29. 29.

    Kessler, R. C. Epidemiology of women and depression. J. Affect. Disord. 74, 5–13 (2003).

  30. 30.

    Kendler, K. S., Neale, M. C., Kessler, R. C., Heath, A. C. & Eaves, L. J. A longitudinal twin study of personality and major depression in women. Arch. Gen. Psychiatry 50, 853–862 (1993).

  31. 31.

    Kessler, R. C. The effects of stressful life events on depression. Ann. Rev. Psychol. 48, 191–214 (1997).

  32. 32.

    Mazure, C. M. Life stressors as risk factors in depression. Clinical Psychology: Science and Practice 5, 291–313 (1998).

  33. 33.

    Weissbrod, O., Flint, J. & Rosset, S. Estimating SNP-based heritability and genetic correlation in case–control studies directly and with summary statistics. Am. J. Hum. Genet. 103, 89–99 (2018).

  34. 34.

    Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

  35. 35.

    Loh, P.-R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385–1392 (2015).

  36. 36.

    Shi, H., Kichaev, G. & Pasaniuc, B. Contrasting the genetic architecture of 30 complex traits from summary association data. Am. J. Hum. Genet. 99, 139–153 (2016).

  37. 37.

    Price, A. L. et al. Long-range LD can confound genome scans in admixed populations. Am. J. Hum. Genet. 83, 132–135 (2008).

  38. 38.

    CONVERGE consortium. Sparse whole-genome sequencing identifies two loci for major depressive disorder. Nature 523, 588–591 (2015).

  39. 39.

    Major Depressive Disorder Working Group of the Psychiatric GWAS Consortiumet al. A mega-analysis of genome-wide association studies for major depressive disorder. Mol. Psychiatry 18, 497–511 (2013).

  40. 40.

    Peterson, R. E. et al. Molecular genetic analysis subdivided by adversity exposure suggests etiologic heterogeneity in major depression. Am. J. Psychiatry 175, 545–554 (2018).

  41. 41.

    Northern Ireland Statistics and Research Agency: 2011 Census aggregate data. UK Data Service (2016).

  42. 42.

    Dempster, E. R. & Lerner, I. M. Heritability of threshold characters. Genetics 35, 212–236 (1950).

  43. 43.

    Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).

  44. 44.

    Cross-Disorder Group of the Psychiatric Genomics Consortium. Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet 381, 1371–1379 (2013).

  45. 45.

    Brainstorm Consortium et al. Analysis of shared heritability in common disorders of the brain. Science 360, eaap8757 (2018).

  46. 46.

    Shi, H., Mancuso, N., Spendlove, S. & Pasaniuc, B. Local genetic correlation gives insights into the shared genetic architecture of complex traits. Am. J. Hum. Genet. 101, 737–751 (2017).

  47. 47.

    GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).

  48. 48.

    Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).

  49. 49.

    Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).

  50. 50.

    Psychiatric Genomics Consortium Bipolar Disorder Working Group. Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4. Nat. Genet. 43, 977–998 (2011).

  51. 51.

    Trzaskowski, M. et al. Quantifying between-cohort and between-sex genetic heterogeneity in major depressive disorder. Am. J. Med. Genet. B Neuropsychiatr. Genet. 180, 439–447 (2019).

  52. 52.

    Turley, P. et al. Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat. Genet. 50, 229–237 (2018).

  53. 53.

    Corfield, E. C., Yang, Y., Martin, N. G. & Nyholt, D. R. A continuum of genetic liability for minor and major depression. Transl. Psychiatry 7, e1131 (2017).

  54. 54.

    Direk, N. et al. An analysis of two genome-wide association meta-analyses identifies a new locus for broad depression phenotype. Biol. Psychiatry 82, 322–329 (2017).

  55. 55.

    Kendler, K. S. et al. Shared and specific genetic risk factors for lifetime major depression, depressive symptoms and neuroticism in three population-based twin samples. Psychol. Med. 49, 2745–2753 (2018).

  56. 56.

    Okbay, A. et al. Genetic variants associated with subjective well-being, depressive symptoms, and neuroticism identified through genome-wide analyses. Nat. Genet. 48, 624–633 (2016).

  57. 57.

    McIntosh, A. M., Sullivan, P. F. & Lewis, C. M. Uncovering the genetic architecture of major depression. Neuron 102, 91–103 (2019).

  58. 58.

    Kendler, K. S. & Karkowski-Shuman, L. Stressful life events and genetic liability to major depression: genetic control of exposure to the environment? Psychol. Med. 27, 539–547 (1997).

  59. 59.

    Fluharty, M., Taylor, A. E., Grabski, M. & Munafo, M. R. The association of cigarette smoking with depression and anxiety: a systematic review. Nicotine Tob. Res. 19, 3–13 (2017).

  60. 60.

    Wootton, R. E. et al. Evidence for causal effects of lifetime smoking on risk for depression and schizophrenia: a Mendelian randomisation study. Psychol. Med. 6, 1–9 (2019).

  61. 61.

    Gibbons, R. D. et al. The computerized adaptive diagnostic test for major depressive disorder (CAD-MDD): a screening tool for depression. J. Clin. Psychiatry 74, 669–674 (2013).

  62. 62.

    Freimer, N. B. & Mohr, D. C. Integrating behavioural health tracking in human genetics research. Nat. Rev. Genet. 20, 129–130 (2019).

  63. 63.

    Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

  64. 64.

    Berisa, T. & Pickrell, J. K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283–285 (2016).

  65. 65.

    Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).

  66. 66.

    Golan, D., Lander, E. S. & Rosset, S. Measuring missing heritability: inferring the contribution of common variants. Proc. Natl Acad. Sci. USA 111, E5272–E5281 (2014).

  67. 67.

    Abraham, G. & Inouye, M. Fast principal component analysis of large-scale genome-wide data. PLoS One 9, e93766 (2014).

  68. 68.

    1000 Genomes Project Consortiumet al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

  69. 69.

    Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

  70. 70.

    Lam, M. et al. RICOPILI: Rapid Imputation for COnsortias PIpeLIne. Bioinformatics (2019).

  71. 71.

    Berardi, D. et al. Increased recognition of depression in primary care. Comparison between primary-care physician and ICD-10 diagnosis of depression. Psychother. Psychosom. 74, 225–230 (2005).

  72. 72.

    Fry, A. et al. Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am. J. Epidemiol. 186, 1026–1034 (2017).

  73. 73.

    Adams, M. J. et al. Factors associated with sharing email information and mental health survey participation in large population cohorts. Int. J. Epidemiol. (2019).

  74. 74.

    Mullins, N. & Lewis, C. M. Genetics of depression: progress at last. Curr. Psychiatry Rep. 19, 43 (2017).

  75. 75.

    Sullivan, P. F. et al. Psychiatric genomics: an update and an agenda. Am. J. Psychiatry 175, 15–27 (2018).

  76. 76.

    Coyne, J. C., Schwenk, T. L. & Smolinski, M. Recognizing depression: a comparison of family physician ratings, self-report, and interview measures. J. Am. Board Fam. Pract. 4, 207–215 (1991).

  77. 77.

    Nevin, R. L. Low validity of self-report in identifying recent mental health diagnosis among U.S. service members completing Pre-Deployment Health Assessment (PreDHA) and deployed to Afghanistan, 2007: a retrospective cohort study. BMC Public Health 9, 376 (2009).

  78. 78.

    Clarke, D. E. et al. DSM-5 field trials in the United States and Canada. Part I: study design, sampling strategy, implementation, and analytic approaches. Am. J. Psychiatry 170, 43–58 (2013).

  79. 79.

    Spitzer, R. L., Forman, J. B. & Nee, J. DSM-III field trials. I. Initial interrater diagnostic reliability. Am. J. Psychiatry 136, 815–817 (1979).

  80. 80.

    Keller, M. B. et al. Results of the DSM-IV mood disorders field trial. Am. J. Psychiatry 152, 843–849 (1995).

  81. 81.

    McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).

  82. 82.

    Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).

Download references


We thank O. Weissbrod, A. Dahl, H. Shi and V. Zuber for insightful discussions. N.C. is supported by the ESPOD Fellowship from European Bioinformatics (EMBL-EBI) and Wellcome Sanger Institute. A.V. is supported by the Swedish Brain Foundation. C.M.L. and G.B. are funded by the National Institute for Health Research (NIHR) Maudsley Biomedical Research Centre at South London Maudsley Foundation Trust and King’s College London. In the last 3 years, M.M.W. has received research funds from the US National Institute of Mental Health (NIMH), the Templeton Foundation and the Sackler Foundation and has received royalties for publication of books on interpersonal psychotherapy from Perseus Press and Oxford University Press, on other topics from the American Psychiatric Association Press and royalties on the social adjustment scale from Multihealth Systems. The CoLaus|PsyCoLaus study was and is supported by research grants from GlaxoSmithKline, the Faculty of Biology and Medicine of Lausanne and the Swiss National Science Foundation (grants 3200B0-105993, 3200B0-118308, 33CSCO-122661, 33CS30-139468, 33CS30-148401 and 33CS30-177535/1). The PGC has received major funding from the US NIMH and the US National Institute of Drug Abuse (U01 MH109528 and U01 MH1095320). This research was conducted using the UK Biobank resource under application no. 28709 and with the support and collaboration from all investigators who make up the MDD Working Group of the PGC (full list in the Supplementary Note). We are greatly indebted to the hundreds of thousands of individuals who have shared their life experiences with the UK Biobank and PGC investigators.

Author information





N.C. and J.F. designed the study. N.C. and J.A.R. performed the analyses. N.C. and J.F. obtained the data from the UK Biobank resource. M.J.A., T.F.M.A., G.B., E.M.B., T.-K.C., A.J.F., H.J.G., S.P.H., D.F.L., C.M.L., G.L., N.G.M., Y.M., O.M., B.M.-M., B.W.J.H.P., R.H.P., G.P., J.B.P., M.P., J.S., J.W.S., F.S., H.T., R.U., S.V.d.A., A.V., M.M.W. and all investigators from the MDD Working Group of the PGC contributed data from the PGC. N.C., K.S.K. and J.F. interpreted the results and wrote the manuscript.

Corresponding author

Correspondence to Na Cai.

Ethics declarations

Competing interests

C.M.L. is on the scientific advisory board of Myriad Neuroscience. H.J.G. has received travel grants and speaker’s honoraria from Fresenius Medical Care, Neuraxpharm and Janssen Cilag as well as research funding from Fresenius Medical Care. B.W.J.H.P. has received (non-related) research grants from Jansen Research and Boehringer Ingelheim.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Simulations of misdiagnosis and misclassification.

a-c, Each boxplot show h2SNP estimates from 10 simulated phenotypes, with upper and lower boundaries of boxes represent the first to third quartiles of all estimates, and the whiskers extends to 1.5 times the interquartile range of the estimates. a, This figure shows that liability scale h2SNP does not change with shifting of liability threshold Ki{0.1, 0.2, 0.3, 0.4, 0.5} for simulated heritabilities \(h_i^2 \in \left\{ {0.2,0.4,0.6,0.8} \right\}\). b, The figure shows that liability scale h2SNP is deflated with increasing percentage of controls being misdiagnosed as cases, when prevalence of diagnosed cases is kept constant at Ki=0.2, for simulated heritabilities \(h_i^2 \in \left\{ {0.2,0.4,0.6,0.8} \right\}\). c, This figure shows liability scale h2SNP is deflated with increasing percentage of misclassification of cases of “other” disease as cases of focal disease, if rG between the two diseases are moderate to low, for simulated \(h_{i,1}^2 = 0.4\), for each of which all cases at prevalence Ki,1=0.2 are correctly identified as cases.

Extended Data Fig. 2 Simulations of misclassification at different heritabilities.

a-d, These figures shows the estimated h2SNP using–pcgc option with–prevalence K in LDAK, plotted on the y-axis) of binary traits (yi,1, where i {1..10}) with simulated \(h_{i,1}^2\)0.2, 0.4, 0.6, and 0.8, for each of which all cases (at prevalence Ki,1= 0.2) are correctly identified as cases, while varying numbers of cases misclassified from a genetically correlated binary trait (yi,2, where i{1..10}) of equal \(h_{i,1}^2\) and prevalence as cases of yi,1. Genetic correlations between yi,1 and yi,2 (rGi{0, 0.2, 0.4, 0.6, 0.8, 0.95}) are shown in the grey bars above each panel. Each boxplot show h2SNP estimates from 10 simulated phenotypes, with upper and lower boundaries of boxes represent the first to third quartiles of all estimates, and the whiskers extends to 1.5 times the interquartile range of the estimates.

Extended Data Fig. 3 GWAS on neuroticism and smoking in UK Biobank.

a, b, This figure shows the Manhattan plot of neuroticism score (data field 20127, quantitative trait from 0 to 12) in 274,107 individuals and ever smoked status (data field 20160, binary trait of 0 for “No”, and 1 for “Yes”) in 336,066 individuals in UK Biobank using linear regression on all 8,968,716 common SNPs (MAF > 5% in all 337,198 White-British, unrelated samples) for all the above analyses in PLINK (version 1.9)32 with 20 PCs and genotyping array as covariates. We report all associations with P-values smaller than 5 ×10−8 as genome-wide significant (red). We indicated the SNPs in SVs and the MHC in all Manhattan plots as hollow points instead of solid points due to lack of control for population structure in these regions, and show all top SNPs within peaks (1-Mb regions) in Supplementary Tables 10 and 11.

Extended Data Fig. 4 LDSC-SEG analysis of tissue-specific enrichment of h2SNP.

a, This figure shows -log10(P) of enrichment in heritability in genes specifically expressed in 44 GTEx tissues, estimated using partitioned heritability in LDSC-SEG, on LifetimeMDD (n = 67,171), PGC1-MDD (n = 18,759), PGC29 (n = 42,455) and a meta-analysis of LifetimeMDD and PGC29 (n = 109,626, PC29.LifetimeMDD, Methods). While PGC29 shows CNS enrichment, neither LifetimeMDD nor the meta-analysis shows the same enrichment. This suggests sample size and differences in genetic architecture and cohort heterogeneity affects results from LDSC-SEG. b, This figure shows the same analysis performed on down-sampled data for each definition of depression. Each definition is randomly down-sampled to 7,500 cases and 42,500 controls, a constant prevalence of 0.15, to remove confounding from sample size and difference in statistical power on the enrichment analysis. This figure shows that at equal sample size and prevalence, GPNoDep (no-MDD Help-seeking phenotype) is the only one showing CNS enrichment, suggesting it may be driving the CNS enrichment signal in GPpsy in Fig. 5.

Extended Data Fig. 5 GWAS hits from 23andMe are not specific to MDD.

This figure shows the odds ratios of risk alleles (Risk Allele ORs) at 17 loci significantly associated with help-seeking based definitions of MDD in 23andMe27, in GWAS conducted on CIDI-based (LifetimeMDD, in purple), help-seeking (GPpsy in red) and no-MDD (GPNoDep, in orange) based definitions of MDD, as well as conditions other than MDD: neuroticism, smoking and SCZ (all in brown). SNPs missing in each panel are not tested in the respective GWAS. For clarity of display, scales on different panels vary to accommodate the different magnitudes of ORs of SNPs in different conditions. ORs at all 17 loci are highly consistent across phenotypes, regardless of whether it is a definition or MDD or a risk factor or condition other than MDD. All results are shown in Supplementary Table 20. Error bars show the standard errors of the estimates.

Extended Data Fig. 6 Out-of-sample prediction in PGC cohorts.

a, This figure shows the Nagelkerke’s r2 of polygenic risk scores (PRS) calculated for each definition of depression in UK Biobank and MDD status indicated in 19 PGC29-MDD cohorts, while controlling for cohort specific effects. PRS were calculated using effect sizes at independent (LD r2 < 0.1) SNPs passing P-value thresholds 10−4, 0.001, 0.01, 0.05, 0.01, 0.2, 0.5 and 1 respectively, in GWAS performed on all definitions of depression in UK Biobank. b, This figure shows the same analysis performed on down-sampled data (7,500 cases, 42,500 controls) for each definition of depression.

Extended Data Fig. 7 Relationship between effective sample size and prediction accuracy.

a, This figure shows the relationship between the ratio of effective sample sizes between the full cohort (NFC) and down-sampled (NDS) data for each definition of depression and the ratio of their mean Chi-square (χ2) statistic from GWAS, with black line x = y for reference. Across all definitions of depression,\(\frac{{\overline {\chi ^2_{FC}} - 1}}{{\overline {\chi ^2_{DS}} - 1}}\) is highly correlated with \(\frac{{N_{FC}}}{{N_{DS}}}\) (Pearson r2 = 0.999, P = 5.50 ×10−7), and \(\frac{{N_{FC}}}{{N_{DS}}}\) has an effect of beta = 1.27 (s.e. = 0.02) on \(\frac{{\overline {\chi ^2_{FC}} - 1}}{{\overline {\chi ^2_{DS}} - 1}}\). b, This figure shows the Nagelkerke’s r2 (Nkr2) for MDD status in PGC29 cohorts predicted for PRS of different definitions of depression at NFC, plotted against their respective empirical Nkr2 at NFC, both at P-value threshold = 1. The Pearson correlation r2 between predicted and actual NKr2 across all definitions were 0.989 (P = 4.46 ×10−5). c, This figure shows for each definition of depression the effective sample size NX required for each predicted Nkr2 in out-of-sample prediction of MDD status in PGC29 cohorts. While Nx= 274,677 (indicated with orange vertical dotted line) for GPpsy to achieve a Nkr2 of 0.0172 (indicated with orange horizontal dotted line), a smaller Nx= 129,106 (indicated with pink vertical dotted line) is needed to achieve the same Nkr2 for LifetimeMDD.

Extended Data Fig. 8 Prediction accuracy in cohorts with different percentage of DSM MDD cases.

a, This figure shows the area under the curve (AUC) of polygenic risk scores (PRS) calculated for each definition of depression in UK Biobank and MDD status indicated in 20 PGC29-MDD cohorts at P-value threshold of 0.1 (using all SNPs after LD-clumping, see results at all P-value thresholds in Supplementary Table 23), plotting AUC for each cohort against their respective percentage of cases fulfilling DSM-5 criteria A for MDD (see Supplementary Table 21). It shows that strictly defined CIDI-based LifetimeMDD is the only definition of depression in UK Biobank that shows increases in AUC as percentage of cases fulfilling DSM-5 criteria A for MDD in PGC cohorts increases, despite not giving the highest AUC. b, This figure shows the same analysis removing the PGC29-MDD cohort rad3, which is the outlier giving AUC > 0.6 in GPpsy in a. As this is a UK-based cohort, it is possible it contains relatives of individuals in UK Biobank that upwardly biased prediction accuracy in it. For all analysis shown in Fig. 7, Extended Data Figs. 6 and 7 and Supplementary Table 23, we have removed this cohort.

Supplementary information

Supplementary Information

Supplementary Note, Supplementary Figs. 1–7 and Supplementary Tables 1–23

Reporting Summary

Source data

Source Data Fig. 2

Supporting statistics for Fig. 2.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cai, N., Revez, J.A., Adams, M.J. et al. Minimal phenotyping yields genome-wide association signals of low specificity for major depression. Nat Genet 52, 437–447 (2020).

Download citation

Further reading