Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Case–control association mapping by proxy using family history of disease

Abstract

Collecting cases for case–control genetic association studies can be time-consuming and expensive. In some situations (such as studies of late-onset or rapidly lethal diseases), it may be more practical to identify family members of cases. In randomly ascertained cohorts, replacing cases with their first-degree relatives enables studies of diseases that are absent (or nearly absent) in the cohort. We refer to this approach as genome-wide association study by proxy (GWAX) and apply it to 12 common diseases in 116,196 individuals from the UK Biobank. Meta-analysis with published genome-wide association study summary statistics replicated established risk loci and yielded four newly associated loci for Alzheimer's disease, eight for coronary artery disease and five for type 2 diabetes. In addition to informing disease biology, our results demonstrate the utility of association mapping without directly observing cases. We anticipate that GWAX will prove useful in future genetic studies of complex traits in large population cohorts.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Power of proxy case–control association designs.
Figure 2: Effective sample sizes of case–control versus proxy case–control association designs in the UK Biobank.
Figure 3: Comparison of adjusted odds ratios and previously reported case–control odds ratios at established risk loci for three diseases with publicly available summary statistics.
Figure 4: Manhattan plots of fixed-effects meta-analysis results for Alzheimer's disease, coronary artery disease and type 2 diabetes.

References

  1. 1

    Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

    PubMed  PubMed Central  Article  Google Scholar 

  2. 2

    Eriksson, N. et al. Web-based, participant-driven studies yield novel genetic associations for common traits. PLoS Genet. 6, e1000993 (2010).

    PubMed  PubMed Central  Article  Google Scholar 

  3. 3

    Hayes, B.J., Bowman, P.J., Chamberlain, A.J. & Goddard, M.E. Genomic selection in dairy cattle: progress and challenges. J. Dairy Sci. 92, 433–443 (2009).

    CAS  PubMed  Article  Google Scholar 

  4. 4

    Garrick, D.J., Taylor, J.F. & Fernando, R.L. Deregressing estimated breeding values and weighting information for genomic regression analyses. Genet. Sel. Evol. 41, 55 (2009).

    PubMed  PubMed Central  Article  Google Scholar 

  5. 5

    Cole, J.B. et al. Distribution and location of genetic effects for dairy traits. J. Dairy Sci. 92, 2931–2946 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  6. 6

    Visscher, P.M. & Duffy, D.L. The value of relatives with phenotypes but missing genotypes in association studies for quantitative traits. Genet. Epidemiol. 30, 30–36 (2006).

    PubMed  Article  Google Scholar 

  7. 7

    Chen, W.-M. & Abecasis, G.R. Family-based association tests for genomewide association scans. Am. J. Hum. Genet. 81, 913–926 (2007).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  8. 8

    Barzilai, N. et al. Unique lipoprotein phenotype and genotype associated with exceptional longevity. J. Am. Med. Assoc. 290, 2030–2040 (2003).

    CAS  Article  Google Scholar 

  9. 9

    Joshi, P.K. et al. Variants near CHRNA3/5 and APOE have age- and sex-related effects on human lifespan. Nat. Commun. 7, 11174 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  10. 10

    Pilling, L.C. et al. Human longevity is influenced by many genetic variants: evidence from 75,000 UK Biobank participants. Aging 8, 547–560 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  11. 11

    Tan, Q., Zhao, J.H., Li, S., Kruse, T.A. & Christensen, K. Power assessment for genetic association study of human longevity using offspring of long-lived subjects. Eur. J. Epidemiol. 25, 501–506 (2010).

    PubMed  PubMed Central  Article  Google Scholar 

  12. 12

    Gudbjartsson, D.F. et al. Many sequence variants affecting diversity of adult human height. Nat. Genet. 40, 609–615 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  13. 13

    Kong, A. et al. Parental origin of sequence variants associated with complex diseases. Nature 462, 868–874 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  14. 14

    Thornton, T. & McPeek, M.S. Case–control association testing with related individuals: a more powerful quasi-likelihood score test. Am. J. Hum. Genet. 81, 321–337 (2007).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  15. 15

    Hebert, L.E., Scherr, P.A., Bienias, J.L., Bennett, D.A. & Evans, D.A. Alzheimer disease in the US population: prevalence estimates using the 2000 census. Arch. Neurol. 60, 1119–1122 (2003).

    PubMed  Article  Google Scholar 

  16. 16

    de Lau, L.M. & Breteler, M.M. Epidemiology of Parkinson's disease. Lancet Neurol. 5, 525–535 (2006).

    PubMed  Article  Google Scholar 

  17. 17

    Corder, E.H. et al. Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer's disease in late onset families. Science 261, 921–923 (1993).

    CAS  PubMed  Article  Google Scholar 

  18. 18

    Danesh, J., Collins, R. & Peto, R. Lipoprotein(a) and coronary heart disease. Meta-analysis of prospective studies. Circulation 102, 1082–1085 (2000).

    CAS  PubMed  Article  Google Scholar 

  19. 19

    Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

  20. 20

    International Consortium for Blood Pressure Genome-Wide Association Studies. Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature 478, 103–109 (2011).

  21. 21

    Hunter, D.J. et al. A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat. Genet. 39, 870–874 (2007).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  22. 22

    Grant, S.F.A. et al. Variant of transcription factor 7–like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nat. Genet. 38, 320–323 (2006).

    CAS  Article  Google Scholar 

  23. 23

    Hung, R.J. et al. A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25. Nature 452, 633–637 (2008).

    CAS  Article  Google Scholar 

  24. 24

    Nalls, M.A. et al. Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson's disease. Nat. Genet. 46, 989–993 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  25. 25

    Sasieni, P.D. From genotypes to genes: doubling the sample size. Biometrics 53, 1253–1261 (1997).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  26. 26

    International Schizophrenia Consortium. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).

  27. 27

    Lambert, J.-C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease. Nat. Genet. 45, 1452–1458 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  28. 28

    CARDIoGRAMplusC4D Consortium. A comprehensive 1000 Genomes–based genome-wide association meta-analysis of coronary artery disease. Nat. Genet. 47, 1121–1130 (2015).

  29. 29

    DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium. et al. Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nat. Genet. 46, 234–244 (2014).

  30. 30

    Ripke, S. et al. A mega-analysis of genome-wide association studies for major depressive disorder. Mol. Psychiatry 18, 497–511 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  31. 31

    Friedmann, E. et al. Consensus analysis of signal peptide peptidase and homologous human aspartic proteases reveals opposite topology of catalytic domains compared with presenilins. J. Biol. Chem. 279, 50790–50798 (2004).

    CAS  PubMed  Article  Google Scholar 

  32. 32

    Chan, G. et al. CD33 modulates TREM2: convergence of Alzheimer loci. Nat. Neurosci. 18, 1556–1558 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  33. 33

    Gjoneska, E. et al. Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer's disease. Nature 518, 365–369 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  34. 34

    Kurogane, Y. et al. FGD5 mediates proangiogenic action of vascular endothelial growth factor in human vascular endothelial cells. Arterioscler. Thromb. Vasc. Biol. 32, 988–996 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  35. 35

    Taimeh, Z., Loughran, J., Birks, E.J. & Bolli, R. Vascular endothelial growth factor in heart failure. Nat. Rev. Cardiol. 10, 519–530 (2013).

    CAS  PubMed  Article  Google Scholar 

  36. 36

    Garner, K. et al. Phosphatidylinositol transfer protein, cytoplasmic 1 (PITPNC1) binds and transfers phosphatidic acid. J. Biol. Chem. 287, 32263–32276 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  37. 37

    Dolgin, E. Personalized investigation. Nat. Med. 16, 953–955 (2010).

    CAS  PubMed  Article  Google Scholar 

  38. 38

    Ledford, H. Genome hacker uncovers largest-ever family tree. Nature http://dx.doi.org/10.1038/nature.2013.14037 (2013).

  39. 39

    Campbell, D.D., Sham, P.C., Knight, J., Wickham, H. & Landau, S. Software for generating liability distributions for pedigrees conditional on their observed disease states and covariates. Genet. Epidemiol. 34, 159–170 (2010).

    PubMed  Article  Google Scholar 

  40. 40

    Hayeck, T.J. et al. Mixed model with correction for case–control ascertainment increases association power. Am. J. Hum. Genet. 96, 720–730 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  41. 41

    Weissbrod, O., Lippert, C., Geiger, D. & Heckerman, D. Accurate liability estimation improves power in ascertained case–control studies. Nat. Methods 12, 332–334 (2015).

    CAS  PubMed  Article  Google Scholar 

  42. 42

    Delaneau, O., Marchini, J. & 1000 Genomes Project Consortium. Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel. Nat. Commun. 5, 3934 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  43. 43

    Howie, B.N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).

    PubMed  PubMed Central  Article  Google Scholar 

  44. 44

    UK10K Consortium. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).

  45. 45

    1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

  46. 46

    Rafnar, T. et al. Mutations in BRIP1 confer high risk of ovarian cancer. Nat. Genet. 43, 1104–1107 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  47. 47

    Burdick, J.T., Chen, W.-M., Abecasis, G.R. & Cheung, V.G. In silico method for inferring genotypes in pedigrees. Nat. Genet. 38, 1002–1004 (2006).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  48. 48

    Chang, C.C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).

    PubMed  PubMed Central  Article  Google Scholar 

  49. 49

    Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. 50

    Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  51. 51

    Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375, S1–S3 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

Download references

Acknowledgements

We thank T. Hayeck and J. Jakobsdottir for comments on a draft of this manuscript. J.Z.L. and J.K.P. are partially supported by the National Institute of Mental Health (NIH grant R01MH106842). This research has been conducted using the UK Biobank Resource.

Author information

Affiliations

Authors

Contributions

All authors contributed to the study design and writing, and all approved this manuscript. J.Z.L. performed the statistical analysis.

Corresponding author

Correspondence to Jimmy Z Liu.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Number of cases and proxy cases required to detect association at α = 5 × 10–8 for case–control and proxy case–control designs.

The ratio of controls to cases (or proxy cases) is 1. We set the disease prevalence (K) to 0.1 and varied the true odds ratio (OR) from 1.1 to 1.5. Each panel shows the equivalent number of cases and proxy cases required across different minor allele frequencies (MAFs). Across values of K (data only shown for K = 0.1), MAF and OR, the ratio of true cases to proxy cases was ~4 when proxy cases and controls are perfectly classified (red line). When 10% of controls consist of misclassified proxy cases, the ratio increases to ~4.9 (blue line).

Supplementary Figure 2 Relationship between adjusted ORs and directly observed ORs across the allele frequency spectrum.

The x axis denotes OR estimated directly from case–control association testing when cases consist entirely of individuals with one parent (or one full sibling) affected with a disease. The y axis denotes the adjusted OR such that it is comparable to OR estimated directly using true cases and controls.

Supplementary Figure 3 Manhattan plots of primary proxy case–control association analysis results for 12 phenotypes.

Chromosome and positions are plotted on the x axis. Strength of association is plotted on the y axis. The red line corresponds to the genome-wide significant threshold of P < 5 × 10–8. −log10 (P values) are truncated at 40 for illustrative purposes.

Supplementary Figure 4 Mean polygenic risk scores among UK Biobank individuals for Alzheimer's disease, coronary artery disease and type 2 diabetes.

Risk scores were calculated using lists of established risk loci and reported effect sizes extracted from publicly available published GWAS summary statistics. Error bars denote the 95% confidence intervals of the mean normalized polygenic risk scores. Differences between each pair of risk scores were tested using Welch’s t test. No significant difference between any of the unaffected individuals with two affected parents and 2× unaffected individuals with one affected parent were identified (P > 0.09).

Supplementary Figure 5 Regional association plots of four novel Alzheimer’s disease risk loci.

The coordinates on the x axis are GRCh37. SNPs are plotted according to their base-pair position and strength of association. The color of each point indicates the degree of linkage disequilibrium with the index SNP (in blue).

Supplementary Figure 6 Regional association plots of eight novel coronary artery disease risk loci.

SNPs are plotted according to their base-pair position and strength of association. The color of each point indicates the degree of linkage disequilibrium with the index SNP (in blue).

Supplementary Figure 7 Regional association plots of five novel type 2 diabetes risk loci.

SNPs are plotted according to their base-pair position and strength of association. The color of each point indicates the degree of linkage disequilibrium with the index SNP (in blue).

Supplementary Figure 8 Quantile–quantile plot and genomic inflation of primary proxy case–control association analysis results for 12 phenotypes.

The dashed red line corresponds to y = x.

Supplementary Figure 9 Quantile–quantile plot, genomic inflation and LD score regression intercepts of fixed-effects meta-analysis results for four phenotypes.

The dashed red line corresponds to y = x.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–9, Supplementary Tables 1, 3 and 5, and Supplementary Note (PDF 2539 kb)

Supplementary Tables 2 and 4

Supplementary Tables 2 and 4 (XLSX 74 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Liu, J., Erlich, Y. & Pickrell, J. Case–control association mapping by proxy using family history of disease. Nat Genet 49, 325–331 (2017). https://doi.org/10.1038/ng.3766

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing