A method for identifying genetic heterogeneity within phenotypically defined disease subgroups

Liley, James; Todd, John A; Wallace, Chris

doi:10.1038/ng.3751

Technical Report
Published: 26 December 2016

A method for identifying genetic heterogeneity within phenotypically defined disease subgroups

Nature Genetics volume 49, pages 310–316 (2017)Cite this article

6497 Accesses
20 Citations
39 Altmetric
Metrics details

Subjects

Abstract

Many common diseases show wide phenotypic variation. We present a statistical method for determining whether phenotypically defined subgroups of disease cases represent different genetic architectures, in which disease-associated variants have different effect sizes in two subgroups. Our method models the genome-wide distributions of genetic association statistics with mixture Gaussians. We apply a global test without requiring explicit identification of disease-associated variants, thus maximizing power in comparison to standard variant-by-variant subgroup analysis. Where evidence for genetic subgrouping is found, we present methods for post hoc identification of the contributing genetic variants. We demonstrate the method on a range of simulated and test data sets, for which expected results are already known. We investigate subgroups of individuals with type 1 diabetes (T1D) defined by autoantibody positivity, establishing evidence for differential genetic architecture with positivity for thyroid-peroxidase-specific antibody, driven generally by variants in known T1D-associated genomic regions.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Overview of the three-category model.**

**Figure 2: Quantile–quantile plot from simulations demonstrating type 1 error rate control of the PLR test.**

**Figure 3: Observed absolute Z_a and Z_d scores for T1D–RA comparison.**

Figure 4: The power of PLR testing to reject H₀ (genetic homogeneity between case subgroups) depends on the number of SNPs in category 3 and the underlying values of model parameters σ₂, σ₃, τ, and ρ.

**Figure 5: Z_a and Z_d scores for age at diagnosis in T1D, excluding the MHC region.**

Estimating disease prevalence in large datasets using genetic risk scores

Article Open access 08 November 2021

The distribution of common-variant effect sizes

Article 29 July 2021

A cross-population atlas of genetic associations for 220 human phenotypes

Article 30 September 2021

References

Li, L. et al. Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Sci. Transl. Med. 7, 311ra174 (2015).
Article Google Scholar
Morris, A.P. et al. A powerful approach to subphenotype analysis in population-based genetic association studies. Genet. Epidemiol. 34, 335–343 (2010).
Article Google Scholar
Plagnol, V. et al. Genome-wide association analysis of autoantibody positivity in type 1 diabetes cases. PLoS Genet. 7, e1002216 (2011).
Article CAS Google Scholar
Speed, D., Hemani, G., Johnson, M.R. & Balding, D.J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 91, 1011–1021 (2012).
Article CAS Google Scholar
Chen, H., Chen, J. & Kalbfleisch, J.D. A modified likelihood ratio test for homogeneity in finite mixture models. J. R. Stat. Soc. Series B Stat. Methodol. 63, 19–29 (2001).
Article Google Scholar
Andreassen, O.A. et al. Improved detection of common variants associated with schizophrenia and bipolar disorder using pleiotropy-informed conditional false discovery rate. PLoS Genet. 9, e1003455 (2013).
Article CAS Google Scholar
Liley, J. & Wallace, C. A pleiotropy-informed Bayesian false discovery rate adapted to a shared control design finds new disease associations from GWAS summary statistics. PLoS Genet. 11, e1004926 (2015).
Article Google Scholar
Leslie, S. et al. The fine-scale genetic structure of the British population. Nature 519, 309–314 (2015).
Article CAS Google Scholar
Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).
Fortune, M.D. et al. Statistical colocalization of genetic risk variants for related autoimmune diseases in the context of common controls. Nat. Genet. 47, 839–846 (2015).
Article CAS Google Scholar
Cooper, J.D. et al. Seven newly identified loci for autoimmune thyroid disease. Hum. Mol. Genet. 21, 5202–5208 (2012).
Article CAS Google Scholar
Hyttinen, V., Kaprio, J., Kinnunen, L., Koskenvuo, M. & Tuomilehto, J. Genetic liability of type 1 diabetes and the onset age among 22,650 young Finnish twin pairs: a nationwide follow-up study. Diabetes 52, 1052–1055 (2003).
Article CAS Google Scholar
Howson, J.M.M., Walker, N.M., Smyth, D.J. & Todd, J.A. Analysis of 19 genes for association with type I diabetes in the Type I Diabetes Genetics Consortium families. Genes Immun. 10 (Suppl. 1), S74–S84 (2009).
Article CAS Google Scholar
Howson, J.M., Rosinger, S., Smyth, D.J., Boehm, B.O. & Todd, J.A. Genetic analysis of adult-onset autoimmune diabetes. Diabetes 60, 2645–2653 (2011).
Article CAS Google Scholar
Howson, J.M. et al. Evidence of gene–gene interaction and age-at-diagnosis effects in type 1 diabetes. Diabetes 61, 3012–3017 (2012).
Article CAS Google Scholar
Barrett, J.C. et al. Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nat. Genet. 41, 703–707 (2009).
Article CAS Google Scholar
Traylor, M. et al. Using phenotypic heterogeneity to increase the power of genome-wide association studies: application to age at onset of ischemic stroke subphenotypes. Genet. Epidemiol. 37, 495–503 (2013).
Article Google Scholar
Wen, Y. & Lu, Q. A multiclass likelihood ratio approach for genetic risk prediction allowing for phenotypic heterogeneity. Genet. Epidemiol. 37, 715–725 (2013).
Article Google Scholar
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Article CAS Google Scholar
Chatterjee, N. & Carroll, R.J. Semiparametric maximum-likelihood estimation exploiting gene–environment independence in case-control studies. Biometrika 92, 399–418 (2005).
Article Google Scholar
Self, S.G. & Liang, K.Y. Asymptotic properties of maximum-likelihood estimators and likelihood ratio tests under nonstandard conditions. J. Am. Stat. Assoc. 82, 605–610 (1987).
Article Google Scholar
Cortes, A. & Brown, M.A. Promise and pitfalls of the Immunochip. Arthritis Res. Ther. 13, 101 (2011).
Article Google Scholar
Dempster, A.P., Laird, N.M. & Rubin, D.B. Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. Series B Stat. Methodol. 39, 1–38 (1977).
Google Scholar
Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning. Springer Series in Statistics (Springer, 2001).
Loh, P.R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
Article CAS Google Scholar
Lee, S.H., Yang, J., Goddard, M.E., Visscher, P.M. & Wray, N.R. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism–derived genomic relationships and restricted maximum likelihood. Bioinformatics 28, 2540–2542 (2012).
Article CAS Google Scholar
Anderson, C.A. et al. Data quality control in genetic case–control association studies. Nat. Protoc. 5, 1564–1573 (2010).
Article CAS Google Scholar
Devlin, B., Roeder, K. & Wasserman, L. Genomic control, a new approach to genetic-based association studies. Theor. Popul. Biol. 60, 155–166 (2001).
Article CAS Google Scholar

Download references

Acknowledgements

We acknowledge the help of the Diabetes and Inflammation Laboratory Data Service for access and quality control procedures on the data sets used in this study. The JDRF/Wellcome Trust Diabetes and Inflammation Laboratory is in receipt of a Wellcome Trust Strategic Award (107212; J.A.T.) and receives funding from the JDRF (grant 5-SRA-2015-130-A-N; J.A.T.) and the NIHR Cambridge Biomedical Research Centre. The research leading to these results has received funding from the European Union's Seventh Framework Programme (grant FP7/2007-2013; J.A.T.) under grant agreement 241447 (NAIMIT). J.L. is funded by the NIHR Cambridge Biomedical Research Centre and is on the Wellcome Trust PhD program in Mathematical Genomics and Medicine at the University of Cambridge. C.W. is funded by the Wellcome Trust (grants 089989 and 107881) and the MRC (grant MC_UP_1302/5). The Cambridge Institute for Medical Research (CIMR) is in receipt of a Wellcome Trust Strategic Award (100140). We thank M. Simmonds, S. Gough, J. Franklyn, and O. Brand for sharing their AITD genetic association data set and all patients with AITD and control subjects for participating in this study. The AITD UK national collection was funded by the Wellcome Trust. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and Affiliations

Department of Medical Genetics, JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, NIHR Cambridge Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK
James Liley, John A Todd & Chris Wallace
Department of Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK
James Liley & Chris Wallace
Nuffield Department of Medicine, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
John A Todd
MRC Biostatistics Unit, Institute of Public Health, University Forvie Site, Cambridge, UK.,
Chris Wallace

Authors

James Liley
View author publications
You can also search for this author in PubMed Google Scholar
John A Todd
View author publications
You can also search for this author in PubMed Google Scholar
Chris Wallace
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.L. conceived the statistical methods, wrote the software, performed the analyses, analyzed the data, and wrote the manuscript. J.A.T. analyzed the results and edited the manuscript. C.W. conceived the study, analyzed the data, and wrote the manuscript.

Corresponding authors

Correspondence to James Liley or Chris Wallace.

Ethics declarations

Competing interests

The JDRF-Wellcome Trust Diabetes and Inflammation Laboratory receives funding from Hoffmann La Roche and Eli Lilly and Company.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–9, Supplementary Tables 1–9 and Supplementary Note (PDF 9727 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liley, J., Todd, J. & Wallace, C. A method for identifying genetic heterogeneity within phenotypically defined disease subgroups. Nat Genet 49, 310–316 (2017). https://doi.org/10.1038/ng.3751

Download citation

Received: 02 August 2016
Accepted: 23 November 2016
Published: 26 December 2016
Issue Date: February 2017
DOI: https://doi.org/10.1038/ng.3751

This article is cited by

Polygenic profiles define aspects of clinical heterogeneity in attention deficit hyperactivity disorder
- Sonja LaBianca
- Isabell Brikell
- Andrew J. Schork
Nature Genetics (2024)
The common rs13266634 C > T variant in SLC30A8 contributes to the heterogeneity of phenotype and clinical features of both type 1 and type 2 diabetic subtypes
- Kuanfeng Xu
- Hui Lv
- Qi Fu
Acta Diabetologica (2022)
A fast wavelet-based functional association analysis replicates several susceptibility loci for birth weight in a Norwegian population
- William R. P. Denault
- Julia Romanowska
- Astanand Jugessur
BMC Genomics (2021)
The associations between three genome-wide risk variants for serum C-peptide of T1D and autoantibody-positive T1D risk, and clinical characteristics in Chinese population
- Yingjie Feng
- Yuyue Zhang
- Kuanfeng Xu
Journal of Human Genetics (2020)

A method for identifying genetic heterogeneity within phenotypically defined disease subgroups

Subjects

Abstract

Access options

Similar content being viewed by others

Estimating disease prevalence in large datasets using genetic risk scores

The distribution of common-variant effect sizes

A cross-population atlas of genetic associations for 220 human phenotypes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Supplementary information

Supplementary Text and Figures

Rights and permissions

About this article

Cite this article

This article is cited by

Polygenic profiles define aspects of clinical heterogeneity in attention deficit hyperactivity disorder

The common rs13266634 C > T variant in SLC30A8 contributes to the heterogeneity of phenotype and clinical features of both type 1 and type 2 diabetic subtypes

A fast wavelet-based functional association analysis replicates several susceptibility loci for birth weight in a Norwegian population

The associations between three genome-wide risk variants for serum C-peptide of T1D and autoantibody-positive T1D risk, and clinical characteristics in Chinese population

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links