Abstract
Collecting cases for case–control genetic association studies can be time-consuming and expensive. In some situations (such as studies of late-onset or rapidly lethal diseases), it may be more practical to identify family members of cases. In randomly ascertained cohorts, replacing cases with their first-degree relatives enables studies of diseases that are absent (or nearly absent) in the cohort. We refer to this approach as genome-wide association study by proxy (GWAX) and apply it to 12 common diseases in 116,196 individuals from the UK Biobank. Meta-analysis with published genome-wide association study summary statistics replicated established risk loci and yielded four newly associated loci for Alzheimer's disease, eight for coronary artery disease and five for type 2 diabetes. In addition to informing disease biology, our results demonstrate the utility of association mapping without directly observing cases. We anticipate that GWAX will prove useful in future genetic studies of complex traits in large population cohorts.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
Eriksson, N. et al. Web-based, participant-driven studies yield novel genetic associations for common traits. PLoS Genet. 6, e1000993 (2010).
Hayes, B.J., Bowman, P.J., Chamberlain, A.J. & Goddard, M.E. Genomic selection in dairy cattle: progress and challenges. J. Dairy Sci. 92, 433–443 (2009).
Garrick, D.J., Taylor, J.F. & Fernando, R.L. Deregressing estimated breeding values and weighting information for genomic regression analyses. Genet. Sel. Evol. 41, 55 (2009).
Cole, J.B. et al. Distribution and location of genetic effects for dairy traits. J. Dairy Sci. 92, 2931–2946 (2009).
Visscher, P.M. & Duffy, D.L. The value of relatives with phenotypes but missing genotypes in association studies for quantitative traits. Genet. Epidemiol. 30, 30–36 (2006).
Chen, W.-M. & Abecasis, G.R. Family-based association tests for genomewide association scans. Am. J. Hum. Genet. 81, 913–926 (2007).
Barzilai, N. et al. Unique lipoprotein phenotype and genotype associated with exceptional longevity. J. Am. Med. Assoc. 290, 2030–2040 (2003).
Joshi, P.K. et al. Variants near CHRNA3/5 and APOE have age- and sex-related effects on human lifespan. Nat. Commun. 7, 11174 (2016).
Pilling, L.C. et al. Human longevity is influenced by many genetic variants: evidence from 75,000 UK Biobank participants. Aging 8, 547–560 (2016).
Tan, Q., Zhao, J.H., Li, S., Kruse, T.A. & Christensen, K. Power assessment for genetic association study of human longevity using offspring of long-lived subjects. Eur. J. Epidemiol. 25, 501–506 (2010).
Gudbjartsson, D.F. et al. Many sequence variants affecting diversity of adult human height. Nat. Genet. 40, 609–615 (2008).
Kong, A. et al. Parental origin of sequence variants associated with complex diseases. Nature 462, 868–874 (2009).
Thornton, T. & McPeek, M.S. Case–control association testing with related individuals: a more powerful quasi-likelihood score test. Am. J. Hum. Genet. 81, 321–337 (2007).
Hebert, L.E., Scherr, P.A., Bienias, J.L., Bennett, D.A. & Evans, D.A. Alzheimer disease in the US population: prevalence estimates using the 2000 census. Arch. Neurol. 60, 1119–1122 (2003).
de Lau, L.M. & Breteler, M.M. Epidemiology of Parkinson's disease. Lancet Neurol. 5, 525–535 (2006).
Corder, E.H. et al. Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer's disease in late onset families. Science 261, 921–923 (1993).
Danesh, J., Collins, R. & Peto, R. Lipoprotein(a) and coronary heart disease. Meta-analysis of prospective studies. Circulation 102, 1082–1085 (2000).
Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).
International Consortium for Blood Pressure Genome-Wide Association Studies. Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature 478, 103–109 (2011).
Hunter, D.J. et al. A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat. Genet. 39, 870–874 (2007).
Grant, S.F.A. et al. Variant of transcription factor 7–like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nat. Genet. 38, 320–323 (2006).
Hung, R.J. et al. A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25. Nature 452, 633–637 (2008).
Nalls, M.A. et al. Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson's disease. Nat. Genet. 46, 989–993 (2014).
Sasieni, P.D. From genotypes to genes: doubling the sample size. Biometrics 53, 1253–1261 (1997).
International Schizophrenia Consortium. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).
Lambert, J.-C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease. Nat. Genet. 45, 1452–1458 (2013).
CARDIoGRAMplusC4D Consortium. A comprehensive 1000 Genomes–based genome-wide association meta-analysis of coronary artery disease. Nat. Genet. 47, 1121–1130 (2015).
DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium. et al. Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nat. Genet. 46, 234–244 (2014).
Ripke, S. et al. A mega-analysis of genome-wide association studies for major depressive disorder. Mol. Psychiatry 18, 497–511 (2013).
Friedmann, E. et al. Consensus analysis of signal peptide peptidase and homologous human aspartic proteases reveals opposite topology of catalytic domains compared with presenilins. J. Biol. Chem. 279, 50790–50798 (2004).
Chan, G. et al. CD33 modulates TREM2: convergence of Alzheimer loci. Nat. Neurosci. 18, 1556–1558 (2015).
Gjoneska, E. et al. Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer's disease. Nature 518, 365–369 (2015).
Kurogane, Y. et al. FGD5 mediates proangiogenic action of vascular endothelial growth factor in human vascular endothelial cells. Arterioscler. Thromb. Vasc. Biol. 32, 988–996 (2012).
Taimeh, Z., Loughran, J., Birks, E.J. & Bolli, R. Vascular endothelial growth factor in heart failure. Nat. Rev. Cardiol. 10, 519–530 (2013).
Garner, K. et al. Phosphatidylinositol transfer protein, cytoplasmic 1 (PITPNC1) binds and transfers phosphatidic acid. J. Biol. Chem. 287, 32263–32276 (2012).
Dolgin, E. Personalized investigation. Nat. Med. 16, 953–955 (2010).
Ledford, H. Genome hacker uncovers largest-ever family tree. Nature http://dx.doi.org/10.1038/nature.2013.14037 (2013).
Campbell, D.D., Sham, P.C., Knight, J., Wickham, H. & Landau, S. Software for generating liability distributions for pedigrees conditional on their observed disease states and covariates. Genet. Epidemiol. 34, 159–170 (2010).
Hayeck, T.J. et al. Mixed model with correction for case–control ascertainment increases association power. Am. J. Hum. Genet. 96, 720–730 (2015).
Weissbrod, O., Lippert, C., Geiger, D. & Heckerman, D. Accurate liability estimation improves power in ascertained case–control studies. Nat. Methods 12, 332–334 (2015).
Delaneau, O., Marchini, J. & 1000 Genomes Project Consortium. Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel. Nat. Commun. 5, 3934 (2014).
Howie, B.N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).
UK10K Consortium. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).
1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Rafnar, T. et al. Mutations in BRIP1 confer high risk of ovarian cancer. Nat. Genet. 43, 1104–1107 (2011).
Burdick, J.T., Chen, W.-M., Abecasis, G.R. & Cheung, V.G. In silico method for inferring genotypes in pedigrees. Nat. Genet. 38, 1002–1004 (2006).
Chang, C.C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375, S1–S3 (2012).
Acknowledgements
We thank T. Hayeck and J. Jakobsdottir for comments on a draft of this manuscript. J.Z.L. and J.K.P. are partially supported by the National Institute of Mental Health (NIH grant R01MH106842). This research has been conducted using the UK Biobank Resource.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study design and writing, and all approved this manuscript. J.Z.L. performed the statistical analysis.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Number of cases and proxy cases required to detect association at α = 5 × 10–8 for case–control and proxy case–control designs.
The ratio of controls to cases (or proxy cases) is 1. We set the disease prevalence (K) to 0.1 and varied the true odds ratio (OR) from 1.1 to 1.5. Each panel shows the equivalent number of cases and proxy cases required across different minor allele frequencies (MAFs). Across values of K (data only shown for K = 0.1), MAF and OR, the ratio of true cases to proxy cases was ~4 when proxy cases and controls are perfectly classified (red line). When 10% of controls consist of misclassified proxy cases, the ratio increases to ~4.9 (blue line).
Supplementary Figure 2 Relationship between adjusted ORs and directly observed ORs across the allele frequency spectrum.
The x axis denotes OR estimated directly from case–control association testing when cases consist entirely of individuals with one parent (or one full sibling) affected with a disease. The y axis denotes the adjusted OR such that it is comparable to OR estimated directly using true cases and controls.
Supplementary Figure 3 Manhattan plots of primary proxy case–control association analysis results for 12 phenotypes.
Chromosome and positions are plotted on the x axis. Strength of association is plotted on the y axis. The red line corresponds to the genome-wide significant threshold of P < 5 × 10–8. −log10 (P values) are truncated at 40 for illustrative purposes.
Supplementary Figure 4 Mean polygenic risk scores among UK Biobank individuals for Alzheimer's disease, coronary artery disease and type 2 diabetes.
Risk scores were calculated using lists of established risk loci and reported effect sizes extracted from publicly available published GWAS summary statistics. Error bars denote the 95% confidence intervals of the mean normalized polygenic risk scores. Differences between each pair of risk scores were tested using Welch’s t test. No significant difference between any of the unaffected individuals with two affected parents and 2× unaffected individuals with one affected parent were identified (P > 0.09).
Supplementary Figure 5 Regional association plots of four novel Alzheimer’s disease risk loci.
The coordinates on the x axis are GRCh37. SNPs are plotted according to their base-pair position and strength of association. The color of each point indicates the degree of linkage disequilibrium with the index SNP (in blue).
Supplementary Figure 6 Regional association plots of eight novel coronary artery disease risk loci.
SNPs are plotted according to their base-pair position and strength of association. The color of each point indicates the degree of linkage disequilibrium with the index SNP (in blue).
Supplementary Figure 7 Regional association plots of five novel type 2 diabetes risk loci.
SNPs are plotted according to their base-pair position and strength of association. The color of each point indicates the degree of linkage disequilibrium with the index SNP (in blue).
Supplementary Figure 8 Quantile–quantile plot and genomic inflation of primary proxy case–control association analysis results for 12 phenotypes.
The dashed red line corresponds to y = x.
Supplementary Figure 9 Quantile–quantile plot, genomic inflation and LD score regression intercepts of fixed-effects meta-analysis results for four phenotypes.
The dashed red line corresponds to y = x.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–9, Supplementary Tables 1, 3 and 5, and Supplementary Note (PDF 2539 kb)
Supplementary Tables 2 and 4
Supplementary Tables 2 and 4 (XLSX 74 kb)
Rights and permissions
About this article
Cite this article
Liu, J., Erlich, Y. & Pickrell, J. Case–control association mapping by proxy using family history of disease. Nat Genet 49, 325–331 (2017). https://doi.org/10.1038/ng.3766
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.3766
This article is cited by
-
Associations between genetically predicted plasma protein levels and Alzheimer’s disease risk: a study using genetic prediction models
Alzheimer's Research & Therapy (2024)
-
Specification curve analysis to identify heterogeneity in risk factors for dementia: findings from the UK Biobank
BMC Medicine (2024)
-
Alzheimer’s disease genome-wide association studies in the context of statistical heterogeneity
Molecular Psychiatry (2024)
-
Current views on meningeal lymphatics and immunity in aging and Alzheimer’s disease
Molecular Neurodegeneration (2023)
-
Rare variant aggregation in 148,508 exomes identifies genes associated with proxy dementia
Scientific Reports (2023)