Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Technical Report
  • Published:

A method for identifying genetic heterogeneity within phenotypically defined disease subgroups

Abstract

Many common diseases show wide phenotypic variation. We present a statistical method for determining whether phenotypically defined subgroups of disease cases represent different genetic architectures, in which disease-associated variants have different effect sizes in two subgroups. Our method models the genome-wide distributions of genetic association statistics with mixture Gaussians. We apply a global test without requiring explicit identification of disease-associated variants, thus maximizing power in comparison to standard variant-by-variant subgroup analysis. Where evidence for genetic subgrouping is found, we present methods for post hoc identification of the contributing genetic variants. We demonstrate the method on a range of simulated and test data sets, for which expected results are already known. We investigate subgroups of individuals with type 1 diabetes (T1D) defined by autoantibody positivity, establishing evidence for differential genetic architecture with positivity for thyroid-peroxidase-specific antibody, driven generally by variants in known T1D-associated genomic regions.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Overview of the three-category model.
Figure 2: Quantile–quantile plot from simulations demonstrating type 1 error rate control of the PLR test.
Figure 3: Observed absolute Za and Zd scores for T1D–RA comparison.
Figure 4: The power of PLR testing to reject H0 (genetic homogeneity between case subgroups) depends on the number of SNPs in category 3 and the underlying values of model parameters σ2, σ3, τ, and ρ.
Figure 5: Za and Zd scores for age at diagnosis in T1D, excluding the MHC region.

Similar content being viewed by others

References

  1. Li, L. et al. Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Sci. Transl. Med. 7, 311ra174 (2015).

    Article  Google Scholar 

  2. Morris, A.P. et al. A powerful approach to subphenotype analysis in population-based genetic association studies. Genet. Epidemiol. 34, 335–343 (2010).

    Article  Google Scholar 

  3. Plagnol, V. et al. Genome-wide association analysis of autoantibody positivity in type 1 diabetes cases. PLoS Genet. 7, e1002216 (2011).

    Article  CAS  Google Scholar 

  4. Speed, D., Hemani, G., Johnson, M.R. & Balding, D.J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 91, 1011–1021 (2012).

    Article  CAS  Google Scholar 

  5. Chen, H., Chen, J. & Kalbfleisch, J.D. A modified likelihood ratio test for homogeneity in finite mixture models. J. R. Stat. Soc. Series B Stat. Methodol. 63, 19–29 (2001).

    Article  Google Scholar 

  6. Andreassen, O.A. et al. Improved detection of common variants associated with schizophrenia and bipolar disorder using pleiotropy-informed conditional false discovery rate. PLoS Genet. 9, e1003455 (2013).

    Article  CAS  Google Scholar 

  7. Liley, J. & Wallace, C. A pleiotropy-informed Bayesian false discovery rate adapted to a shared control design finds new disease associations from GWAS summary statistics. PLoS Genet. 11, e1004926 (2015).

    Article  Google Scholar 

  8. Leslie, S. et al. The fine-scale genetic structure of the British population. Nature 519, 309–314 (2015).

    Article  CAS  Google Scholar 

  9. Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

  10. Fortune, M.D. et al. Statistical colocalization of genetic risk variants for related autoimmune diseases in the context of common controls. Nat. Genet. 47, 839–846 (2015).

    Article  CAS  Google Scholar 

  11. Cooper, J.D. et al. Seven newly identified loci for autoimmune thyroid disease. Hum. Mol. Genet. 21, 5202–5208 (2012).

    Article  CAS  Google Scholar 

  12. Hyttinen, V., Kaprio, J., Kinnunen, L., Koskenvuo, M. & Tuomilehto, J. Genetic liability of type 1 diabetes and the onset age among 22,650 young Finnish twin pairs: a nationwide follow-up study. Diabetes 52, 1052–1055 (2003).

    Article  CAS  Google Scholar 

  13. Howson, J.M.M., Walker, N.M., Smyth, D.J. & Todd, J.A. Analysis of 19 genes for association with type I diabetes in the Type I Diabetes Genetics Consortium families. Genes Immun. 10 (Suppl. 1), S74–S84 (2009).

    Article  CAS  Google Scholar 

  14. Howson, J.M., Rosinger, S., Smyth, D.J., Boehm, B.O. & Todd, J.A. Genetic analysis of adult-onset autoimmune diabetes. Diabetes 60, 2645–2653 (2011).

    Article  CAS  Google Scholar 

  15. Howson, J.M. et al. Evidence of gene–gene interaction and age-at-diagnosis effects in type 1 diabetes. Diabetes 61, 3012–3017 (2012).

    Article  CAS  Google Scholar 

  16. Barrett, J.C. et al. Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nat. Genet. 41, 703–707 (2009).

    Article  CAS  Google Scholar 

  17. Traylor, M. et al. Using phenotypic heterogeneity to increase the power of genome-wide association studies: application to age at onset of ischemic stroke subphenotypes. Genet. Epidemiol. 37, 495–503 (2013).

    Article  Google Scholar 

  18. Wen, Y. & Lu, Q. A multiclass likelihood ratio approach for genetic risk prediction allowing for phenotypic heterogeneity. Genet. Epidemiol. 37, 715–725 (2013).

    Article  Google Scholar 

  19. Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).

    Article  CAS  Google Scholar 

  20. Chatterjee, N. & Carroll, R.J. Semiparametric maximum-likelihood estimation exploiting gene–environment independence in case-control studies. Biometrika 92, 399–418 (2005).

    Article  Google Scholar 

  21. Self, S.G. & Liang, K.Y. Asymptotic properties of maximum-likelihood estimators and likelihood ratio tests under nonstandard conditions. J. Am. Stat. Assoc. 82, 605–610 (1987).

    Article  Google Scholar 

  22. Cortes, A. & Brown, M.A. Promise and pitfalls of the Immunochip. Arthritis Res. Ther. 13, 101 (2011).

    Article  Google Scholar 

  23. Dempster, A.P., Laird, N.M. & Rubin, D.B. Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. Series B Stat. Methodol. 39, 1–38 (1977).

    Google Scholar 

  24. Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning. Springer Series in Statistics (Springer, 2001).

  25. Loh, P.R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).

    Article  CAS  Google Scholar 

  26. Lee, S.H., Yang, J., Goddard, M.E., Visscher, P.M. & Wray, N.R. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism–derived genomic relationships and restricted maximum likelihood. Bioinformatics 28, 2540–2542 (2012).

    Article  CAS  Google Scholar 

  27. Anderson, C.A. et al. Data quality control in genetic case–control association studies. Nat. Protoc. 5, 1564–1573 (2010).

    Article  CAS  Google Scholar 

  28. Devlin, B., Roeder, K. & Wasserman, L. Genomic control, a new approach to genetic-based association studies. Theor. Popul. Biol. 60, 155–166 (2001).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We acknowledge the help of the Diabetes and Inflammation Laboratory Data Service for access and quality control procedures on the data sets used in this study. The JDRF/Wellcome Trust Diabetes and Inflammation Laboratory is in receipt of a Wellcome Trust Strategic Award (107212; J.A.T.) and receives funding from the JDRF (grant 5-SRA-2015-130-A-N; J.A.T.) and the NIHR Cambridge Biomedical Research Centre. The research leading to these results has received funding from the European Union's Seventh Framework Programme (grant FP7/2007-2013; J.A.T.) under grant agreement 241447 (NAIMIT). J.L. is funded by the NIHR Cambridge Biomedical Research Centre and is on the Wellcome Trust PhD program in Mathematical Genomics and Medicine at the University of Cambridge. C.W. is funded by the Wellcome Trust (grants 089989 and 107881) and the MRC (grant MC_UP_1302/5). The Cambridge Institute for Medical Research (CIMR) is in receipt of a Wellcome Trust Strategic Award (100140). We thank M. Simmonds, S. Gough, J. Franklyn, and O. Brand for sharing their AITD genetic association data set and all patients with AITD and control subjects for participating in this study. The AITD UK national collection was funded by the Wellcome Trust. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

J.L. conceived the statistical methods, wrote the software, performed the analyses, analyzed the data, and wrote the manuscript. J.A.T. analyzed the results and edited the manuscript. C.W. conceived the study, analyzed the data, and wrote the manuscript.

Corresponding authors

Correspondence to James Liley or Chris Wallace.

Ethics declarations

Competing interests

The JDRF-Wellcome Trust Diabetes and Inflammation Laboratory receives funding from Hoffmann La Roche and Eli Lilly and Company.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–9, Supplementary Tables 1–9 and Supplementary Note (PDF 9727 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liley, J., Todd, J. & Wallace, C. A method for identifying genetic heterogeneity within phenotypically defined disease subgroups. Nat Genet 49, 310–316 (2017). https://doi.org/10.1038/ng.3751

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.3751

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics