Abstract
Many common diseases show wide phenotypic variation. We present a statistical method for determining whether phenotypically defined subgroups of disease cases represent different genetic architectures, in which disease-associated variants have different effect sizes in two subgroups. Our method models the genome-wide distributions of genetic association statistics with mixture Gaussians. We apply a global test without requiring explicit identification of disease-associated variants, thus maximizing power in comparison to standard variant-by-variant subgroup analysis. Where evidence for genetic subgrouping is found, we present methods for post hoc identification of the contributing genetic variants. We demonstrate the method on a range of simulated and test data sets, for which expected results are already known. We investigate subgroups of individuals with type 1 diabetes (T1D) defined by autoantibody positivity, establishing evidence for differential genetic architecture with positivity for thyroid-peroxidase-specific antibody, driven generally by variants in known T1D-associated genomic regions.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Li, L. et al. Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Sci. Transl. Med. 7, 311ra174 (2015).
Morris, A.P. et al. A powerful approach to subphenotype analysis in population-based genetic association studies. Genet. Epidemiol. 34, 335–343 (2010).
Plagnol, V. et al. Genome-wide association analysis of autoantibody positivity in type 1 diabetes cases. PLoS Genet. 7, e1002216 (2011).
Speed, D., Hemani, G., Johnson, M.R. & Balding, D.J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 91, 1011–1021 (2012).
Chen, H., Chen, J. & Kalbfleisch, J.D. A modified likelihood ratio test for homogeneity in finite mixture models. J. R. Stat. Soc. Series B Stat. Methodol. 63, 19–29 (2001).
Andreassen, O.A. et al. Improved detection of common variants associated with schizophrenia and bipolar disorder using pleiotropy-informed conditional false discovery rate. PLoS Genet. 9, e1003455 (2013).
Liley, J. & Wallace, C. A pleiotropy-informed Bayesian false discovery rate adapted to a shared control design finds new disease associations from GWAS summary statistics. PLoS Genet. 11, e1004926 (2015).
Leslie, S. et al. The fine-scale genetic structure of the British population. Nature 519, 309–314 (2015).
Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).
Fortune, M.D. et al. Statistical colocalization of genetic risk variants for related autoimmune diseases in the context of common controls. Nat. Genet. 47, 839–846 (2015).
Cooper, J.D. et al. Seven newly identified loci for autoimmune thyroid disease. Hum. Mol. Genet. 21, 5202–5208 (2012).
Hyttinen, V., Kaprio, J., Kinnunen, L., Koskenvuo, M. & Tuomilehto, J. Genetic liability of type 1 diabetes and the onset age among 22,650 young Finnish twin pairs: a nationwide follow-up study. Diabetes 52, 1052–1055 (2003).
Howson, J.M.M., Walker, N.M., Smyth, D.J. & Todd, J.A. Analysis of 19 genes for association with type I diabetes in the Type I Diabetes Genetics Consortium families. Genes Immun. 10 (Suppl. 1), S74–S84 (2009).
Howson, J.M., Rosinger, S., Smyth, D.J., Boehm, B.O. & Todd, J.A. Genetic analysis of adult-onset autoimmune diabetes. Diabetes 60, 2645–2653 (2011).
Howson, J.M. et al. Evidence of gene–gene interaction and age-at-diagnosis effects in type 1 diabetes. Diabetes 61, 3012–3017 (2012).
Barrett, J.C. et al. Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nat. Genet. 41, 703–707 (2009).
Traylor, M. et al. Using phenotypic heterogeneity to increase the power of genome-wide association studies: application to age at onset of ischemic stroke subphenotypes. Genet. Epidemiol. 37, 495–503 (2013).
Wen, Y. & Lu, Q. A multiclass likelihood ratio approach for genetic risk prediction allowing for phenotypic heterogeneity. Genet. Epidemiol. 37, 715–725 (2013).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Chatterjee, N. & Carroll, R.J. Semiparametric maximum-likelihood estimation exploiting gene–environment independence in case-control studies. Biometrika 92, 399–418 (2005).
Self, S.G. & Liang, K.Y. Asymptotic properties of maximum-likelihood estimators and likelihood ratio tests under nonstandard conditions. J. Am. Stat. Assoc. 82, 605–610 (1987).
Cortes, A. & Brown, M.A. Promise and pitfalls of the Immunochip. Arthritis Res. Ther. 13, 101 (2011).
Dempster, A.P., Laird, N.M. & Rubin, D.B. Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. Series B Stat. Methodol. 39, 1–38 (1977).
Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning. Springer Series in Statistics (Springer, 2001).
Loh, P.R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
Lee, S.H., Yang, J., Goddard, M.E., Visscher, P.M. & Wray, N.R. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism–derived genomic relationships and restricted maximum likelihood. Bioinformatics 28, 2540–2542 (2012).
Anderson, C.A. et al. Data quality control in genetic case–control association studies. Nat. Protoc. 5, 1564–1573 (2010).
Devlin, B., Roeder, K. & Wasserman, L. Genomic control, a new approach to genetic-based association studies. Theor. Popul. Biol. 60, 155–166 (2001).
Acknowledgements
We acknowledge the help of the Diabetes and Inflammation Laboratory Data Service for access and quality control procedures on the data sets used in this study. The JDRF/Wellcome Trust Diabetes and Inflammation Laboratory is in receipt of a Wellcome Trust Strategic Award (107212; J.A.T.) and receives funding from the JDRF (grant 5-SRA-2015-130-A-N; J.A.T.) and the NIHR Cambridge Biomedical Research Centre. The research leading to these results has received funding from the European Union's Seventh Framework Programme (grant FP7/2007-2013; J.A.T.) under grant agreement 241447 (NAIMIT). J.L. is funded by the NIHR Cambridge Biomedical Research Centre and is on the Wellcome Trust PhD program in Mathematical Genomics and Medicine at the University of Cambridge. C.W. is funded by the Wellcome Trust (grants 089989 and 107881) and the MRC (grant MC_UP_1302/5). The Cambridge Institute for Medical Research (CIMR) is in receipt of a Wellcome Trust Strategic Award (100140). We thank M. Simmonds, S. Gough, J. Franklyn, and O. Brand for sharing their AITD genetic association data set and all patients with AITD and control subjects for participating in this study. The AITD UK national collection was funded by the Wellcome Trust. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
J.L. conceived the statistical methods, wrote the software, performed the analyses, analyzed the data, and wrote the manuscript. J.A.T. analyzed the results and edited the manuscript. C.W. conceived the study, analyzed the data, and wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The JDRF-Wellcome Trust Diabetes and Inflammation Laboratory receives funding from Hoffmann La Roche and Eli Lilly and Company.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–9, Supplementary Tables 1–9 and Supplementary Note (PDF 9727 kb)
Rights and permissions
About this article
Cite this article
Liley, J., Todd, J. & Wallace, C. A method for identifying genetic heterogeneity within phenotypically defined disease subgroups. Nat Genet 49, 310–316 (2017). https://doi.org/10.1038/ng.3751
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.3751
This article is cited by
-
Polygenic profiles define aspects of clinical heterogeneity in attention deficit hyperactivity disorder
Nature Genetics (2024)
-
The common rs13266634 C > T variant in SLC30A8 contributes to the heterogeneity of phenotype and clinical features of both type 1 and type 2 diabetic subtypes
Acta Diabetologica (2022)
-
A fast wavelet-based functional association analysis replicates several susceptibility loci for birth weight in a Norwegian population
BMC Genomics (2021)
-
The associations between three genome-wide risk variants for serum C-peptide of T1D and autoantibody-positive T1D risk, and clinical characteristics in Chinese population
Journal of Human Genetics (2020)