Case–control association mapping by proxy using family history of disease

Liu, Jimmy Z; Erlich, Yaniv; Pickrell, Joseph K

doi:10.1038/ng.3766

Analysis
Published: 16 January 2017

Case–control association mapping by proxy using family history of disease

Nature Genetics volume 49, pages 325–331 (2017)Cite this article

10k Accesses
133 Citations
86 Altmetric
Metrics details

Subjects

Abstract

Collecting cases for case–control genetic association studies can be time-consuming and expensive. In some situations (such as studies of late-onset or rapidly lethal diseases), it may be more practical to identify family members of cases. In randomly ascertained cohorts, replacing cases with their first-degree relatives enables studies of diseases that are absent (or nearly absent) in the cohort. We refer to this approach as genome-wide association study by proxy (GWAX) and apply it to 12 common diseases in 116,196 individuals from the UK Biobank. Meta-analysis with published genome-wide association study summary statistics replicated established risk loci and yielded four newly associated loci for Alzheimer's disease, eight for coronary artery disease and five for type 2 diabetes. In addition to informing disease biology, our results demonstrate the utility of association mapping without directly observing cases. We anticipate that GWAX will prove useful in future genetic studies of complex traits in large population cohorts.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Power of proxy case–control association designs.**

**Figure 2: Effective sample sizes of case–control versus proxy case–control association designs in the UK Biobank.**

**Figure 3: Comparison of adjusted odds ratios and previously reported case–control odds ratios at established risk loci for three diseases with publicly available summary statistics.**

**Figure 4: Manhattan plots of fixed-effects meta-analysis results for Alzheimer's disease, coronary artery disease and type 2 diabetes.**

Exome-wide analysis implicates rare protein-altering variants in human handedness

Article Open access 02 April 2024

Dick Schijven, Sourena Soheili-Nezhad, … Clyde Francks

Genome-wide association studies

Article 26 August 2021

Emil Uffelmann, Qin Qin Huang, … Danielle Posthuma

The effects of genetic and modifiable risk factors on brain regions vulnerable to ageing and disease

Article Open access 27 March 2024

Jordi Manuello, Joosung Min, … Gwenaëlle Douaud

References

Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
Article PubMed PubMed Central Google Scholar
Eriksson, N. et al. Web-based, participant-driven studies yield novel genetic associations for common traits. PLoS Genet. 6, e1000993 (2010).
Article PubMed PubMed Central Google Scholar
Hayes, B.J., Bowman, P.J., Chamberlain, A.J. & Goddard, M.E. Genomic selection in dairy cattle: progress and challenges. J. Dairy Sci. 92, 433–443 (2009).
Article CAS PubMed Google Scholar
Garrick, D.J., Taylor, J.F. & Fernando, R.L. Deregressing estimated breeding values and weighting information for genomic regression analyses. Genet. Sel. Evol. 41, 55 (2009).
Article PubMed PubMed Central Google Scholar
Cole, J.B. et al. Distribution and location of genetic effects for dairy traits. J. Dairy Sci. 92, 2931–2946 (2009).
Article CAS PubMed Google Scholar
Visscher, P.M. & Duffy, D.L. The value of relatives with phenotypes but missing genotypes in association studies for quantitative traits. Genet. Epidemiol. 30, 30–36 (2006).
Article PubMed Google Scholar
Chen, W.-M. & Abecasis, G.R. Family-based association tests for genomewide association scans. Am. J. Hum. Genet. 81, 913–926 (2007).
Article CAS PubMed PubMed Central Google Scholar
Barzilai, N. et al. Unique lipoprotein phenotype and genotype associated with exceptional longevity. J. Am. Med. Assoc. 290, 2030–2040 (2003).
Article CAS Google Scholar
Joshi, P.K. et al. Variants near CHRNA3/5 and APOE have age- and sex-related effects on human lifespan. Nat. Commun. 7, 11174 (2016).
Article CAS PubMed PubMed Central Google Scholar
Pilling, L.C. et al. Human longevity is influenced by many genetic variants: evidence from 75,000 UK Biobank participants. Aging 8, 547–560 (2016).
Article CAS PubMed PubMed Central Google Scholar
Tan, Q., Zhao, J.H., Li, S., Kruse, T.A. & Christensen, K. Power assessment for genetic association study of human longevity using offspring of long-lived subjects. Eur. J. Epidemiol. 25, 501–506 (2010).
Article PubMed PubMed Central Google Scholar
Gudbjartsson, D.F. et al. Many sequence variants affecting diversity of adult human height. Nat. Genet. 40, 609–615 (2008).
Article CAS PubMed Google Scholar
Kong, A. et al. Parental origin of sequence variants associated with complex diseases. Nature 462, 868–874 (2009).
Article CAS PubMed PubMed Central Google Scholar
Thornton, T. & McPeek, M.S. Case–control association testing with related individuals: a more powerful quasi-likelihood score test. Am. J. Hum. Genet. 81, 321–337 (2007).
Article CAS PubMed PubMed Central Google Scholar
Hebert, L.E., Scherr, P.A., Bienias, J.L., Bennett, D.A. & Evans, D.A. Alzheimer disease in the US population: prevalence estimates using the 2000 census. Arch. Neurol. 60, 1119–1122 (2003).
Article PubMed Google Scholar
de Lau, L.M. & Breteler, M.M. Epidemiology of Parkinson's disease. Lancet Neurol. 5, 525–535 (2006).
Article PubMed Google Scholar
Corder, E.H. et al. Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer's disease in late onset families. Science 261, 921–923 (1993).
Article CAS PubMed Google Scholar
Danesh, J., Collins, R. & Peto, R. Lipoprotein(a) and coronary heart disease. Meta-analysis of prospective studies. Circulation 102, 1082–1085 (2000).
Article CAS PubMed Google Scholar
Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).
International Consortium for Blood Pressure Genome-Wide Association Studies. Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature 478, 103–109 (2011).
Hunter, D.J. et al. A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat. Genet. 39, 870–874 (2007).
Article CAS PubMed PubMed Central Google Scholar
Grant, S.F.A. et al. Variant of transcription factor 7–like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nat. Genet. 38, 320–323 (2006).
Article CAS PubMed Google Scholar
Hung, R.J. et al. A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25. Nature 452, 633–637 (2008).
Article CAS PubMed Google Scholar
Nalls, M.A. et al. Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson's disease. Nat. Genet. 46, 989–993 (2014).
Article CAS PubMed PubMed Central Google Scholar
Sasieni, P.D. From genotypes to genes: doubling the sample size. Biometrics 53, 1253–1261 (1997).
Article CAS PubMed Google Scholar
International Schizophrenia Consortium. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).
Lambert, J.-C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease. Nat. Genet. 45, 1452–1458 (2013).
Article CAS PubMed PubMed Central Google Scholar
CARDIoGRAMplusC4D Consortium. A comprehensive 1000 Genomes–based genome-wide association meta-analysis of coronary artery disease. Nat. Genet. 47, 1121–1130 (2015).
DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium. et al. Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nat. Genet. 46, 234–244 (2014).
Ripke, S. et al. A mega-analysis of genome-wide association studies for major depressive disorder. Mol. Psychiatry 18, 497–511 (2013).
Article CAS PubMed Google Scholar
Friedmann, E. et al. Consensus analysis of signal peptide peptidase and homologous human aspartic proteases reveals opposite topology of catalytic domains compared with presenilins. J. Biol. Chem. 279, 50790–50798 (2004).
Article CAS PubMed Google Scholar
Chan, G. et al. CD33 modulates TREM2: convergence of Alzheimer loci. Nat. Neurosci. 18, 1556–1558 (2015).
Article CAS PubMed PubMed Central Google Scholar
Gjoneska, E. et al. Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer's disease. Nature 518, 365–369 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kurogane, Y. et al. FGD5 mediates proangiogenic action of vascular endothelial growth factor in human vascular endothelial cells. Arterioscler. Thromb. Vasc. Biol. 32, 988–996 (2012).
Article CAS PubMed Google Scholar
Taimeh, Z., Loughran, J., Birks, E.J. & Bolli, R. Vascular endothelial growth factor in heart failure. Nat. Rev. Cardiol. 10, 519–530 (2013).
Article CAS PubMed Google Scholar
Garner, K. et al. Phosphatidylinositol transfer protein, cytoplasmic 1 (PITPNC1) binds and transfers phosphatidic acid. J. Biol. Chem. 287, 32263–32276 (2012).
Article CAS PubMed PubMed Central Google Scholar
Dolgin, E. Personalized investigation. Nat. Med. 16, 953–955 (2010).
Article CAS PubMed Google Scholar
Ledford, H. Genome hacker uncovers largest-ever family tree. Nature http://dx.doi.org/10.1038/nature.2013.14037 (2013).
Campbell, D.D., Sham, P.C., Knight, J., Wickham, H. & Landau, S. Software for generating liability distributions for pedigrees conditional on their observed disease states and covariates. Genet. Epidemiol. 34, 159–170 (2010).
Article PubMed Google Scholar
Hayeck, T.J. et al. Mixed model with correction for case–control ascertainment increases association power. Am. J. Hum. Genet. 96, 720–730 (2015).
Article CAS PubMed PubMed Central Google Scholar
Weissbrod, O., Lippert, C., Geiger, D. & Heckerman, D. Accurate liability estimation improves power in ascertained case–control studies. Nat. Methods 12, 332–334 (2015).
Article CAS PubMed Google Scholar
Delaneau, O., Marchini, J. & 1000 Genomes Project Consortium. Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel. Nat. Commun. 5, 3934 (2014).
Article CAS PubMed Google Scholar
Howie, B.N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).
Article PubMed PubMed Central Google Scholar
UK10K Consortium. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).
1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Rafnar, T. et al. Mutations in BRIP1 confer high risk of ovarian cancer. Nat. Genet. 43, 1104–1107 (2011).
Article CAS PubMed Google Scholar
Burdick, J.T., Chen, W.-M., Abecasis, G.R. & Cheung, V.G. In silico method for inferring genotypes in pedigrees. Nat. Genet. 38, 1002–1004 (2006).
Article CAS PubMed PubMed Central Google Scholar
Chang, C.C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Article PubMed PubMed Central Google Scholar
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
CAS PubMed PubMed Central Google Scholar
Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
Article CAS PubMed PubMed Central Google Scholar
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375, S1–S3 (2012).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank T. Hayeck and J. Jakobsdottir for comments on a draft of this manuscript. J.Z.L. and J.K.P. are partially supported by the National Institute of Mental Health (NIH grant R01MH106842). This research has been conducted using the UK Biobank Resource.

Author information

Authors and Affiliations

New York Genome Center, New York, New York, USA
Jimmy Z Liu, Yaniv Erlich & Joseph K Pickrell
Department of Computer Science, Columbia University, New York, New York, USA
Yaniv Erlich
Department of Biological Sciences, Columbia University, New York, New York, USA
Joseph K Pickrell

Authors

Jimmy Z Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yaniv Erlich
View author publications
You can also search for this author in PubMed Google Scholar
Joseph K Pickrell
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study design and writing, and all approved this manuscript. J.Z.L. performed the statistical analysis.

Corresponding author

Correspondence to Jimmy Z Liu.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Number of cases and proxy cases required to detect association at α = 5 × 10^–8 for case–control and proxy case–control designs.

The ratio of controls to cases (or proxy cases) is 1. We set the disease prevalence (K) to 0.1 and varied the true odds ratio (OR) from 1.1 to 1.5. Each panel shows the equivalent number of cases and proxy cases required across different minor allele frequencies (MAFs). Across values of K (data only shown for K = 0.1), MAF and OR, the ratio of true cases to proxy cases was ~4 when proxy cases and controls are perfectly classified (red line). When 10% of controls consist of misclassified proxy cases, the ratio increases to ~4.9 (blue line).

Supplementary Figure 2 Relationship between adjusted ORs and directly observed ORs across the allele frequency spectrum.

The x axis denotes OR estimated directly from case–control association testing when cases consist entirely of individuals with one parent (or one full sibling) affected with a disease. The y axis denotes the adjusted OR such that it is comparable to OR estimated directly using true cases and controls.

Supplementary Figure 3 Manhattan plots of primary proxy case–control association analysis results for 12 phenotypes.

Chromosome and positions are plotted on the x axis. Strength of association is plotted on the y axis. The red line corresponds to the genome-wide significant threshold of P < 5 × 10^–8. −log₁₀ (P values) are truncated at 40 for illustrative purposes.

Supplementary Figure 4 Mean polygenic risk scores among UK Biobank individuals for Alzheimer's disease, coronary artery disease and type 2 diabetes.

Risk scores were calculated using lists of established risk loci and reported effect sizes extracted from publicly available published GWAS summary statistics. Error bars denote the 95% confidence intervals of the mean normalized polygenic risk scores. Differences between each pair of risk scores were tested using Welch’s t test. No significant difference between any of the unaffected individuals with two affected parents and 2× unaffected individuals with one affected parent were identified (P > 0.09).

Supplementary Figure 8 Quantile–quantile plot and genomic inflation of primary proxy case–control association analysis results for 12 phenotypes.

The dashed red line corresponds to y = x.

Supplementary Figure 9 Quantile–quantile plot, genomic inflation and LD score regression intercepts of fixed-effects meta-analysis results for four phenotypes.

The dashed red line corresponds to y = x.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–9, Supplementary Tables 1, 3 and 5, and Supplementary Note (PDF 2539 kb)

Supplementary Tables 2 and 4

Supplementary Tables 2 and 4 (XLSX 74 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, J., Erlich, Y. & Pickrell, J. Case–control association mapping by proxy using family history of disease. Nat Genet 49, 325–331 (2017). https://doi.org/10.1038/ng.3766

Download citation

Received: 25 March 2016
Accepted: 14 December 2016
Published: 16 January 2017
Issue Date: March 2017
DOI: https://doi.org/10.1038/ng.3766

This article is cited by

Associations between genetically predicted plasma protein levels and Alzheimer’s disease risk: a study using genetic prediction models
- Jingjing Zhu
- Shuai Liu
- Lang Wu
Alzheimer's Research & Therapy (2024)
Current views on meningeal lymphatics and immunity in aging and Alzheimer’s disease
- Shanon Rego
- Guadalupe Sanchez
- Sandro Da Mesquita
Molecular Neurodegeneration (2023)
Rare variant aggregation in 148,508 exomes identifies genes associated with proxy dementia
- Douglas P. Wightman
- Jeanne E. Savage
- Danielle Posthuma
Scientific Reports (2023)
Step by step: towards a better understanding of the genetic architecture of Alzheimer’s disease
- Jean-Charles Lambert
- Alfredo Ramirez
- Céline Bellenguez
Molecular Psychiatry (2023)
Phenotype integration improves power and preserves specificity in biobank-based genetic studies of major depressive disorder
- Andrew Dahl
- Michael Thompson
- Na Cai
Nature Genetics (2023)

Case–control association mapping by proxy using family history of disease

Subjects

Abstract

Access options

Similar content being viewed by others

Exome-wide analysis implicates rare protein-altering variants in human handedness

Genome-wide association studies

The effects of genetic and modifiable risk factors on brain regions vulnerable to ageing and disease

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary Figure 1 Number of cases and proxy cases required to detect association at α = 5 × 10^–8 for case–control and proxy case–control designs.

Supplementary Figure 2 Relationship between adjusted ORs and directly observed ORs across the allele frequency spectrum.

Supplementary Figure 3 Manhattan plots of primary proxy case–control association analysis results for 12 phenotypes.

Supplementary Figure 4 Mean polygenic risk scores among UK Biobank individuals for Alzheimer's disease, coronary artery disease and type 2 diabetes.

Supplementary Figure 5 Regional association plots of four novel Alzheimer’s disease risk loci.

Supplementary Figure 6 Regional association plots of eight novel coronary artery disease risk loci.

Supplementary Figure 7 Regional association plots of five novel type 2 diabetes risk loci.

Supplementary Figure 8 Quantile–quantile plot and genomic inflation of primary proxy case–control association analysis results for 12 phenotypes.

Supplementary Figure 9 Quantile–quantile plot, genomic inflation and LD score regression intercepts of fixed-effects meta-analysis results for four phenotypes.

Supplementary information

Supplementary Text and Figures

Supplementary Tables 2 and 4

Rights and permissions

About this article

Cite this article

This article is cited by

Associations between genetically predicted plasma protein levels and Alzheimer’s disease risk: a study using genetic prediction models

Current views on meningeal lymphatics and immunity in aging and Alzheimer’s disease

Rare variant aggregation in 148,508 exomes identifies genes associated with proxy dementia

Step by step: towards a better understanding of the genetic architecture of Alzheimer’s disease

Phenotype integration improves power and preserves specificity in biobank-based genetic studies of major depressive disorder

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links