Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Case–control association mapping by proxy using family history of disease

Abstract

Collecting cases for case–control genetic association studies can be time-consuming and expensive. In some situations (such as studies of late-onset or rapidly lethal diseases), it may be more practical to identify family members of cases. In randomly ascertained cohorts, replacing cases with their first-degree relatives enables studies of diseases that are absent (or nearly absent) in the cohort. We refer to this approach as genome-wide association study by proxy (GWAX) and apply it to 12 common diseases in 116,196 individuals from the UK Biobank. Meta-analysis with published genome-wide association study summary statistics replicated established risk loci and yielded four newly associated loci for Alzheimer's disease, eight for coronary artery disease and five for type 2 diabetes. In addition to informing disease biology, our results demonstrate the utility of association mapping without directly observing cases. We anticipate that GWAX will prove useful in future genetic studies of complex traits in large population cohorts.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Power of proxy case–control association designs.
Figure 2: Effective sample sizes of case–control versus proxy case–control association designs in the UK Biobank.
Figure 3: Comparison of adjusted odds ratios and previously reported case–control odds ratios at established risk loci for three diseases with publicly available summary statistics.
Figure 4: Manhattan plots of fixed-effects meta-analysis results for Alzheimer's disease, coronary artery disease and type 2 diabetes.

Similar content being viewed by others

References

  1. Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  2. Eriksson, N. et al. Web-based, participant-driven studies yield novel genetic associations for common traits. PLoS Genet. 6, e1000993 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Hayes, B.J., Bowman, P.J., Chamberlain, A.J. & Goddard, M.E. Genomic selection in dairy cattle: progress and challenges. J. Dairy Sci. 92, 433–443 (2009).

    Article  CAS  PubMed  Google Scholar 

  4. Garrick, D.J., Taylor, J.F. & Fernando, R.L. Deregressing estimated breeding values and weighting information for genomic regression analyses. Genet. Sel. Evol. 41, 55 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Cole, J.B. et al. Distribution and location of genetic effects for dairy traits. J. Dairy Sci. 92, 2931–2946 (2009).

    Article  CAS  PubMed  Google Scholar 

  6. Visscher, P.M. & Duffy, D.L. The value of relatives with phenotypes but missing genotypes in association studies for quantitative traits. Genet. Epidemiol. 30, 30–36 (2006).

    Article  PubMed  Google Scholar 

  7. Chen, W.-M. & Abecasis, G.R. Family-based association tests for genomewide association scans. Am. J. Hum. Genet. 81, 913–926 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Barzilai, N. et al. Unique lipoprotein phenotype and genotype associated with exceptional longevity. J. Am. Med. Assoc. 290, 2030–2040 (2003).

    Article  CAS  Google Scholar 

  9. Joshi, P.K. et al. Variants near CHRNA3/5 and APOE have age- and sex-related effects on human lifespan. Nat. Commun. 7, 11174 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Pilling, L.C. et al. Human longevity is influenced by many genetic variants: evidence from 75,000 UK Biobank participants. Aging 8, 547–560 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Tan, Q., Zhao, J.H., Li, S., Kruse, T.A. & Christensen, K. Power assessment for genetic association study of human longevity using offspring of long-lived subjects. Eur. J. Epidemiol. 25, 501–506 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Gudbjartsson, D.F. et al. Many sequence variants affecting diversity of adult human height. Nat. Genet. 40, 609–615 (2008).

    Article  CAS  PubMed  Google Scholar 

  13. Kong, A. et al. Parental origin of sequence variants associated with complex diseases. Nature 462, 868–874 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Thornton, T. & McPeek, M.S. Case–control association testing with related individuals: a more powerful quasi-likelihood score test. Am. J. Hum. Genet. 81, 321–337 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Hebert, L.E., Scherr, P.A., Bienias, J.L., Bennett, D.A. & Evans, D.A. Alzheimer disease in the US population: prevalence estimates using the 2000 census. Arch. Neurol. 60, 1119–1122 (2003).

    Article  PubMed  Google Scholar 

  16. de Lau, L.M. & Breteler, M.M. Epidemiology of Parkinson's disease. Lancet Neurol. 5, 525–535 (2006).

    Article  PubMed  Google Scholar 

  17. Corder, E.H. et al. Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer's disease in late onset families. Science 261, 921–923 (1993).

    Article  CAS  PubMed  Google Scholar 

  18. Danesh, J., Collins, R. & Peto, R. Lipoprotein(a) and coronary heart disease. Meta-analysis of prospective studies. Circulation 102, 1082–1085 (2000).

    Article  CAS  PubMed  Google Scholar 

  19. Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

  20. International Consortium for Blood Pressure Genome-Wide Association Studies. Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature 478, 103–109 (2011).

  21. Hunter, D.J. et al. A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat. Genet. 39, 870–874 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Grant, S.F.A. et al. Variant of transcription factor 7–like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nat. Genet. 38, 320–323 (2006).

    Article  CAS  PubMed  Google Scholar 

  23. Hung, R.J. et al. A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25. Nature 452, 633–637 (2008).

    Article  CAS  PubMed  Google Scholar 

  24. Nalls, M.A. et al. Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson's disease. Nat. Genet. 46, 989–993 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Sasieni, P.D. From genotypes to genes: doubling the sample size. Biometrics 53, 1253–1261 (1997).

    Article  CAS  PubMed  Google Scholar 

  26. International Schizophrenia Consortium. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).

  27. Lambert, J.-C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease. Nat. Genet. 45, 1452–1458 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. CARDIoGRAMplusC4D Consortium. A comprehensive 1000 Genomes–based genome-wide association meta-analysis of coronary artery disease. Nat. Genet. 47, 1121–1130 (2015).

  29. DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium. et al. Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nat. Genet. 46, 234–244 (2014).

  30. Ripke, S. et al. A mega-analysis of genome-wide association studies for major depressive disorder. Mol. Psychiatry 18, 497–511 (2013).

    Article  CAS  PubMed  Google Scholar 

  31. Friedmann, E. et al. Consensus analysis of signal peptide peptidase and homologous human aspartic proteases reveals opposite topology of catalytic domains compared with presenilins. J. Biol. Chem. 279, 50790–50798 (2004).

    Article  CAS  PubMed  Google Scholar 

  32. Chan, G. et al. CD33 modulates TREM2: convergence of Alzheimer loci. Nat. Neurosci. 18, 1556–1558 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Gjoneska, E. et al. Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer's disease. Nature 518, 365–369 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Kurogane, Y. et al. FGD5 mediates proangiogenic action of vascular endothelial growth factor in human vascular endothelial cells. Arterioscler. Thromb. Vasc. Biol. 32, 988–996 (2012).

    Article  CAS  PubMed  Google Scholar 

  35. Taimeh, Z., Loughran, J., Birks, E.J. & Bolli, R. Vascular endothelial growth factor in heart failure. Nat. Rev. Cardiol. 10, 519–530 (2013).

    Article  CAS  PubMed  Google Scholar 

  36. Garner, K. et al. Phosphatidylinositol transfer protein, cytoplasmic 1 (PITPNC1) binds and transfers phosphatidic acid. J. Biol. Chem. 287, 32263–32276 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Dolgin, E. Personalized investigation. Nat. Med. 16, 953–955 (2010).

    Article  CAS  PubMed  Google Scholar 

  38. Ledford, H. Genome hacker uncovers largest-ever family tree. Nature http://dx.doi.org/10.1038/nature.2013.14037 (2013).

  39. Campbell, D.D., Sham, P.C., Knight, J., Wickham, H. & Landau, S. Software for generating liability distributions for pedigrees conditional on their observed disease states and covariates. Genet. Epidemiol. 34, 159–170 (2010).

    Article  PubMed  Google Scholar 

  40. Hayeck, T.J. et al. Mixed model with correction for case–control ascertainment increases association power. Am. J. Hum. Genet. 96, 720–730 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Weissbrod, O., Lippert, C., Geiger, D. & Heckerman, D. Accurate liability estimation improves power in ascertained case–control studies. Nat. Methods 12, 332–334 (2015).

    Article  CAS  PubMed  Google Scholar 

  42. Delaneau, O., Marchini, J. & 1000 Genomes Project Consortium. Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel. Nat. Commun. 5, 3934 (2014).

    Article  CAS  PubMed  Google Scholar 

  43. Howie, B.N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  44. UK10K Consortium. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).

  45. 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

  46. Rafnar, T. et al. Mutations in BRIP1 confer high risk of ovarian cancer. Nat. Genet. 43, 1104–1107 (2011).

    Article  CAS  PubMed  Google Scholar 

  47. Burdick, J.T., Chen, W.-M., Abecasis, G.R. & Cheung, V.G. In silico method for inferring genotypes in pedigrees. Nat. Genet. 38, 1002–1004 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Chang, C.C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375, S1–S3 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank T. Hayeck and J. Jakobsdottir for comments on a draft of this manuscript. J.Z.L. and J.K.P. are partially supported by the National Institute of Mental Health (NIH grant R01MH106842). This research has been conducted using the UK Biobank Resource.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study design and writing, and all approved this manuscript. J.Z.L. performed the statistical analysis.

Corresponding author

Correspondence to Jimmy Z Liu.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Number of cases and proxy cases required to detect association at α = 5 × 10–8 for case–control and proxy case–control designs.

The ratio of controls to cases (or proxy cases) is 1. We set the disease prevalence (K) to 0.1 and varied the true odds ratio (OR) from 1.1 to 1.5. Each panel shows the equivalent number of cases and proxy cases required across different minor allele frequencies (MAFs). Across values of K (data only shown for K = 0.1), MAF and OR, the ratio of true cases to proxy cases was ~4 when proxy cases and controls are perfectly classified (red line). When 10% of controls consist of misclassified proxy cases, the ratio increases to ~4.9 (blue line).

Supplementary Figure 2 Relationship between adjusted ORs and directly observed ORs across the allele frequency spectrum.

The x axis denotes OR estimated directly from case–control association testing when cases consist entirely of individuals with one parent (or one full sibling) affected with a disease. The y axis denotes the adjusted OR such that it is comparable to OR estimated directly using true cases and controls.

Supplementary Figure 3 Manhattan plots of primary proxy case–control association analysis results for 12 phenotypes.

Chromosome and positions are plotted on the x axis. Strength of association is plotted on the y axis. The red line corresponds to the genome-wide significant threshold of P < 5 × 10–8. −log10 (P values) are truncated at 40 for illustrative purposes.

Supplementary Figure 4 Mean polygenic risk scores among UK Biobank individuals for Alzheimer's disease, coronary artery disease and type 2 diabetes.

Risk scores were calculated using lists of established risk loci and reported effect sizes extracted from publicly available published GWAS summary statistics. Error bars denote the 95% confidence intervals of the mean normalized polygenic risk scores. Differences between each pair of risk scores were tested using Welch’s t test. No significant difference between any of the unaffected individuals with two affected parents and 2× unaffected individuals with one affected parent were identified (P > 0.09).

Supplementary Figure 5 Regional association plots of four novel Alzheimer’s disease risk loci.

The coordinates on the x axis are GRCh37. SNPs are plotted according to their base-pair position and strength of association. The color of each point indicates the degree of linkage disequilibrium with the index SNP (in blue).

Supplementary Figure 6 Regional association plots of eight novel coronary artery disease risk loci.

SNPs are plotted according to their base-pair position and strength of association. The color of each point indicates the degree of linkage disequilibrium with the index SNP (in blue).

Supplementary Figure 7 Regional association plots of five novel type 2 diabetes risk loci.

SNPs are plotted according to their base-pair position and strength of association. The color of each point indicates the degree of linkage disequilibrium with the index SNP (in blue).

Supplementary Figure 8 Quantile–quantile plot and genomic inflation of primary proxy case–control association analysis results for 12 phenotypes.

The dashed red line corresponds to y = x.

Supplementary Figure 9 Quantile–quantile plot, genomic inflation and LD score regression intercepts of fixed-effects meta-analysis results for four phenotypes.

The dashed red line corresponds to y = x.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–9, Supplementary Tables 1, 3 and 5, and Supplementary Note (PDF 2539 kb)

Supplementary Tables 2 and 4

Supplementary Tables 2 and 4 (XLSX 74 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, J., Erlich, Y. & Pickrell, J. Case–control association mapping by proxy using family history of disease. Nat Genet 49, 325–331 (2017). https://doi.org/10.1038/ng.3766

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.3766

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing